News:

11 March 2016 - Forum Rules

Main Menu

Image decompression routine help (also: x86 code)

Started by takashio, May 21, 2012, 02:26:36 PM

Previous topic - Next topic

takashio

Helllo everyone, this is your typical post about a person having issues with a decompression algorithm.

In particular, we're talking compression for a 2002 PC Game (a shopsim with several graphical UI elements I want to translate). The format itself is "known", as it's the KG format used in BISHOP visual novels.

Here is a sample:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  4B 47 02 01 B0 00 40 00 00 00 00 00 30 00 00 00  KG..°.@.....0... | Header, indicates it's 8-bit pal (or 24-bit in other files)
00000010  30 04 00 00 62 06 00 00 00 00 00 00 00 00 00 00  0...b........... | 0x0430 start, 0x0662 end of compressed data
00000020  00 00 00 00 B0 00 00 00 40 00 00 00 00 00 00 00  ....°...@....... | rest is size info
00000030  00 00 00 00 39 4A 6B 00 4A 52 63 00 EF CE 8C 00  ....9Jk.JRc.ïÎŒ. | 256-color Pallete
00000040  C6 A5 F7 00 F7 D6 8C 00 E7 F7 EF 00 EF FF F7 00  Æ¥÷.÷ÖŒ.ç÷ï.ïÿ÷. |
00000050  FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00  ÿÿÿ............. |

(empty palette entries)

00000430  01 01 F0 17 90 88 04 F0 F0 0B F8 00 00 4F 80 A0  ..ð..ˆ.ðð.ø..O€
00000440  00 00 B1 49 05 C5 82 78 70 42 70 60 84 E2 C1 3C  ..±I.Å,xpBp`,,âÁ<
00000450  28 21 38 70 41 70 E0 9E 2C 10 5C 28 21 38 01 AC  (!8pApàž,.\(!8.¬
00000460  02 B8 1A F3 05 CC F1 38 4F 3E F4 28 EE 67 83 04  .¸.ó.Ìñ8O>ô(îgƒ.
00000470  37 09 E2 73 3C 18 35 E6 0B 99 E1 70 B0 6B CC 17  7.âs<.5æ.™áp°kÌ.
00000480  33 C2 78 9C CF 00 2D 80 6A 73 AF 33 C2 82 1B 05  3ÂxœÏ.-€js¯3Â,..
00000490  05 F4 0D 10 10 9C E8 7C E5 CF 85 CE 8B D6 BC DF  .ô...œè|åÏ...΋ּß
000004A0  0B 97 3E 26 06 08 6C 0C 10 DC 1E 01 AD CE BC CF  .—>&..l..Ü...μÏ
000004B0  09 E2 75 80 0C 02 2B 20 40 0A F3 3C 1E 40 82 EB  .âu€..+ @.ó<.@,ë
000004C0  05 04 17 58 1F 3E 66 01 61 CE BC C1 73 A3 02 0B  ...X.>f.aμÁs£..
000004D0  05 04 27 3A 20 21 B0 01 8E 7C 4E 13 5E 60 B8 50  ..': !°.Ž|N.^`¸P
000004E0  43 60 7C EB CC 17 3A 1F 5A F3 05 CE 80 31 CE 01  C`|ëÌ.:.Zó.΀1Î.
000004F0  61 CF 80 00 08 3E 58 2E 2C 13 C3 82 13 83 04 27  aÏ€..>X.,.Ã,.ƒ.'
00000500  16 09 E1 41 09 C3 82 0B 87 04 F1 60 82 E1 41 09  ..áA.Ã,.‡.ñ`,áA.
00000510  C4 82 78 9C 02 B8 1A F3 05 CC F1 38 4F 3E F4 28  Ä,xœ.¸.ó.Ìñ8O>ô(
00000520  EE 67 83 04 37 09 E2 73 3C 18 35 E6 0B 99 E1 70  îgƒ.7.âs<.5æ.™áp
00000530  B0 6B CC 17 33 C2 78 9C CF 0F 80 70 73 AF 33 C2  °kÌ.3ÂxœÏ.€ps¯3Â
00000540  82 1B 05 05 F4 0D 10 10 9C E8 7C E5 CF 85 CE 8B  ,...ô...œè|åÏ...΋
00000550  D6 BC DF 0B 97 3E 26 06 08 6C 0C 10 DC 1E 01 AD  Ö¼ß.—>&..l..Ü...
00000560  CE BC CF 09 E2 75 80 0C 02 2B 20 40 0A F3 3C 1E  μÏ.âu€..+ @.ó<.
00000570  40 82 EB 05 04 17 58 1F 3F F4 08 6C 02 C3 9D 79  @,ë...X.?ô.l.Ã.y
00000580  82 E7 46 04 16 0A 08 4E 74 40 43 60 03 1C F8 9C  ,çF....Nt@C`..øœ
00000590  26 BC C1 70 A0 86 C0 F9 D7 98 2E 74 3E B5 E6 0B  &¼Áp †Àùט.t>µæ.
000005A0  9D 1F 9C 02 FB 9F 00 00 10 7C D0 5C 58 27 87 04  ..œ.ûŸ...|Ð\X'‡.
000005B0  27 06 08 4E 2C 13 C2 82 13 87 04 17 0E 09 E2 C1  '..N,.Â,.‡....âÁ
000005C0  05 C2 82 13 85 04 37 0B 80 57 03 5E 60 B9 9E 27  .Â,.....7.€W.^`¹ž'
000005D0  09 E7 DE 85 1D CC F0 60 86 E1 3C 4E 67 83 06 BC  .çÞ....Ìð`†á<Ngƒ.¼
000005E0  C1 73 3C 2E 16 0D 79 82 E6 78 4F 13 99 E1 3C CC  Ás<...y,æxO.™á<Ì
000005F0  03 53 9D 79 9E 14 10 D8 28 2F A0 68 80 84 E7 43  .S.yž..Ø(/ h€,,çC
00000600  E7 2E 7C 2E 74 5E B5 E6 F8 5C B9 F1 30 30 43 60  ç.|.t^µæø\¹ñ00C`
00000610  E0 9E 1F 00 D6 E7 5E 67 84 F1 3A C0 06 01 15 90  àž..Öç^g,,ñ:À....
00000620  20 05 79 9E 0F 20 41 75 82 82 0B AC 0C 10 DC 1E   .yž. Au,,.¬..Ü.
00000630  01 61 CE BC C1 73 A3 02 0B 05 04 27 3A 20 21 B0  .aμÁs£....': !°
00000640  01 8E 7C 4E 13 5E 60 B8 50 43 60 7C EB CC 17 3A  .Ž|N.^`¸PC`|ëÌ.:
00000650  1F 5A F3 05 CE 87 D7 FD 02 23 00 B0 E7 C0 00 01  .Zó.·×ý.#.°çÀ..
00000660  59 00                                            Y.

This generates the following image:

(this is a bmp file - link here, original KG file here).

Of course, if anyone can recognize what compression is used, I'd be ecstatic. By messing around with the compressed stream, i know the first two bytes are always the first two bytes of the uncompressed data, that it likely uses a sliding window (ie. it's some form of LZ77), that it's not byte-aligned, and that the order of the bytes on the decompressed stream is identical to the bitmap in the BMP file. Here's also an example of how some of the first bytes of the input stream seem to work:

01 01 F0 17 90 88 04 F0 - write 01 01 bytes, then duplicate the buffer and write "01 01 01 01" 0x17 times then [???] write 08 (from 90 88 ) one time + 0x4F times.

Failing that, there is a decompressor already, in the form of a plug-in for the japanese image viewer Susie. It can be found in the next link under BISHOP PK/PFT/KG files plug-in version 0.02, and that is how I got the output BMP. The SPI plug-in files are just renamed DLLs, and the interface is known. This would make a second way to figure out the compression/decompression algorithm...  but I'd like to consider this more of a last ditch effort.

Does anyone has any ideas/pointers on how to advance on either of these methods? Or maybe feel like suggesting me a completely different method to go around this hurdle?

LostTemplar

You could try and contact the author of that plugin. His mail is in the readme, but I have no idea whether it's still valid.

takashio

He has two emails, the one linked to his ISP and a domain he registers and renewed last year.
I mailed him yesterday, but I'm still waiting for a reply (being in japan and all, different timezones, not to mention talking about 11-year old code). In the meantime, since this is the last step for the tools (I already wrote the unpacker for the script and image/data files, and the game does not need a font fix), I might as well explore all alternatives.

LostTemplar

Well, in the case he doesn't answer, I'd recommend disassembling the DLL. It's not that big and in a relatively disassembly-friendly environment like Windows DLLs it should be a lot easier than trying to figure out the format yourself.

takashio

That was why I was hoping the format would seem familiar... Well it was worth a try.

I'm fine with disassembly and debugging, it's just that I really, really hate the x86 instruction set with a passion.
Had a small hope someone here would have more patience than me as regards this sort of reverse engineering, but I'll soldier on.
At any rate, I'll keep this thread updated - someone might be interested in the process/compression in the future.

LostTemplar

I can sympathize with that notion. x86 is just too bloated. I personally like RISC instruction sets a lot more.

Being 256 colors and some kind of LZ* compression, I thought it might be a GIF derivate, but it looks a whole lot different. I tried to deduce some rules to the compression but couldn't really get any farther than you did.

BRPXQZME

Yes, please document here or on the wiki (you can use the same account credentials)—it's the kind of thing we like to see around here. :)
we are in a horrible and deadly danger

kingshriek

I haven't disassembled the entire decompression routine, but I did notice that it copies 8 bits directly to a byte of the output bmp if those bits are preceded by the bits "01". This holds for all but the first two bytes of the input data (after header and palette table) which are just copied directly to the output. If you don't care about the large output file size, this suggests an easy "compression" algorithm.

takashio

#8
Yeah, I noticed a basic pattern like that yesterday. There are a couple more commands. one of them being "copy previous byte X number of times" (i'm guessing this is "command" bits "10"). And this seems to be a basic RLE until I change the first two bytes, let's say to 0x0303, and the plug-in delivers (starting at 0x0436):

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000430  00 00 00 00 00 00 03 03 03 03 03 03 03 03 03 03  ................
00000440  03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
00000450  03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
00000460  03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
00000470  03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
00000480  03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
00000490  03 03 03 03 03 03 08 08 08 08 08 08 08 08 08 08  ................
000004A0  08 08 08 08 08 08 08 08 08 08 08 08 08 08 08 08  ................
000004B0  08 08 08 08 08 08 08 08 08 08 08 08 08 08 08 08  ................
000004C0  08 08 08 08 08 08 08 08 08 08 08 08 08 08 08 08  ................
000004D0  08 08 08 08 08 08 08 08 08 08 08 08 08 08 08 08  ................
000004E0  08 08 08 08 08 08 01 03 03 03 03 03 03 03 03 03  ................
000004F0  03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
00000500  03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
00000510  03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
00000520  03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
00000530  03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
00000540  03 03 03 03 03 03 08 08 08 08 08 08 08 08 08 08  ................
00000550  08 08 08 08 08 08 08 08 08 08 08 08 08 08 08     ...............
several repetitions.

So there is at least another code to repeat a previous sequence from a dictionary or sliding window ala LZ77.

EDIT: Oh wait! You are completely right. Teaches me to read this thread after going out for drinks. I'll try and whip out something that uses that pattern and see if the game rolls with it.