News:

11 March 2016 - Forum Rules

Main Menu

Z80 Decompression Routine

Started by Pennywise, April 25, 2013, 11:09:20 PM

Previous topic - Next topic

Pennywise

Alright so I recently checked out this b/w GB game and much to my surprise the game's main text is compressed and the not garden variety dictionary kind either. I was kinda surprised that a game on this platform and its age, I would find something more complicated. So I was just kinda curious if anyone recognized the compression or this is just something the devs cooked up. I realize it's going to require custom tools for me to even rip the script out. I am however not familiar with various kinds of compression other than what I've come across in the simple Dictionary and RLE stuff.

23d6: 19 ADD HL,DE A:04 F:c0 BC:0000 DE:4000 HL:0204 SP:cefe
23d7: 11 00 d9 LD DE,d900 A:04 F:c0 BC:0000 DE:4000 HL:4204 SP:cefe
23da: 2a LD A,(HL+)[4204]=d1 A:04 F:c0 BC:0000 DE:d900 HL:4204 SP:cefe
23db: fe 04 CP 04 A:d1 F:c0 BC:0000 DE:d900 HL:4205 SP:cefe
23dd: 38 03 JR C, +03 [23e0] A:d1 F:c0 BC:0000 DE:d900 HL:4205 SP:cefe
23df: c3 f1 23 JP 23f1     A:d1 F:c0 BC:0000 DE:d900 HL:4205 SP:cefe
23f2: 47 LD B,A     A:d1 F:c0 BC:0000 DE:d900 HL:4205 SP:cefe
23f3: 2a LD A,(HL+)[4205]=01 A:d1 F:c0 BC:d100 DE:d900 HL:4205 SP:cefe
23f4: 4f LD C,A     A:01 F:c0 BC:d100 DE:d900 HL:4206 SP:cefe
23f5: 2a LD A,(HL+)[4206]=a4 A:01 F:c0 BC:d101 DE:d900 HL:4206 SP:cefe
23f6: e5 PUSH HL     A:a4 F:c0 BC:d101 DE:d900 HL:4207 SP:cefe
23f7: 67 LD H,A     A:a4 F:c0 BC:d101 DE:d900 HL:4207 SP:cefc
23f8: 78 LD A,B     A:a4 F:c0 BC:d101 DE:d900 HL:a407 SP:cefc
23f9: cd 1e 24 CALL 241e A:d1 F:c0 BC:d101 DE:d900 HL:a407 SP:cefc
241f: cb 3f SRL A            A:d1 F:c0 BC:d101 DE:d900 HL:a407 SP:cefa
2421: cb 3f SRL A        A:68 F:c0 BC:d101 DE:d900 HL:a407 SP:cefa
2423: 12 LD (DE),A A:34 F:c0 BC:d101 DE:d900 HL:a407 SP:cefa
2424: 13 INC DE     A:34 F:c0 BC:d101 DE:d900 HL:a407 SP:cefa
2425: c9 RET     A:34 F:c0 BC:d101 DE:d901 HL:a407 SP:cefa
23fc: 79 LD A,C     A:34 F:c0 BC:d101 DE:d901 HL:a407 SP:cefc
23fd: cb 38 SRL B     A:01 F:c0 BC:d101 DE:d901 HL:a407 SP:cefc
23ff: 1f RRA     A:01 F:c0 BC:6801 DE:d901 HL:a407 SP:cefc
2400: cb 38 SRL B            A:80 F:c0 BC:6801 DE:d901 HL:a407 SP:cefc
2402: 1f RRA     A:80 F:c0 BC:3401 DE:d901 HL:a407 SP:cefc
2403: cd 1e 24 CALL 241e A:40 F:c0 BC:3401 DE:d901 HL:a407 SP:cefc
241f: cb 3f SRL A             A:40 F:c0 BC:3401 DE:d901 HL:a407 SP:cefa
2421: cb 3f     SRL A       A:20 F:c0 BC:3401 DE:d901 HL:a407 SP:cefa
2423: 12 LD (DE),A A:10 F:c0 BC:3401 DE:d901 HL:a407 SP:cefa
2424: 13 INC DE A:10 F:c0 BC:3401 DE:d901 HL:a407 SP:cefa
2425: c9 RET A:10 F:c0 BC:3401 DE:d902 HL:a407 SP:cefa
2406: 7c LD A,H A:10 F:c0 BC:3401 DE:d902 HL:a407 SP:cefc
2407: cb 39 SRL C    A:a4 F:c0 BC:3401 DE:d902 HL:a407 SP:cefc
2409: 1f RRA A:a4 F:c0 BC:3400 DE:d902 HL:a407 SP:cefc
240a: cb 39 SRL C A:d2 F:c0 BC:3400 DE:d902 HL:a407 SP:cefc
240c: 1f RRA A:d2 F:c0 BC:3400 DE:d902 HL:a407 SP:cefc
240d: cb 39 SRL C A:69 F:c0 BC:3400 DE:d902 HL:a407 SP:cefc
240f: 1f RRA A:69 F:c0 BC:3400 DE:d902 HL:a407 SP:cefc
2410: cb 39 SRL C A:34 F:c0 BC:3400 DE:d902 HL:a407 SP:cefc
2412: 1f RRA A:34 F:c0 BC:3400 DE:d902 HL:a407 SP:cefc
2413: cd 1e 24 CALL 241e A:1a F:c0 BC:3400 DE:d902 HL:a407 SP:cefc
241f: cb 3f SRL A A:1a F:c0 BC:3400 DE:d902 HL:a407 SP:cefa
2421: cb 3f SRL A A:0d F:c0 BC:3400 DE:d902 HL:a407 SP:cefa
2423: 12 LD (DE),A A:06 F:c0 BC:3400 DE:d902 HL:a407 SP:cefa
2424: 13 INC DE A:06 F:c0 BC:3400 DE:d902 HL:a407 SP:cefa
2425: c9 RET A:06 F:c0 BC:3400 DE:d903 HL:a407 SP:cefa
2416: 3e 3f LD A,3f A:06 F:c0 BC:3400 DE:d903 HL:a407 SP:cefc
2418: a4 AND H A:3f F:c0 BC:3400 DE:d903 HL:a407 SP:cefc
2419: 12 LD (DE),A A:24 F:c0 BC:3400 DE:d903 HL:a407 SP:cefc
241a: 13 INC DE A:24 F:c0 BC:3400 DE:d903 HL:a407 SP:cefc
241b: e1 POP HL A:24 F:c0 BC:3400 DE:d904 HL:a407 SP:cefc
241c: c3 d9 23 JP 23d9 A:24 F:c0 BC:3400 DE:d904 HL:4207 SP:cefe


So just some context, $D900 is the beginning of the spot in WRAM where the decompressed bytes are stored after having had some bit math done to them. $4205 etc are the compressed bytes. From what I can tell, it works in groups of 3 and end up with 4 decompressed bytes each time.

LostTemplar

It's just packing four 6-bit values into three bytes:


aaaaaabb bbbbcccc cccdddddd <=> 00aaaaaa 00bbbbbb 00cccccc 00dddddd

furrykef

You should have seen me trying to figure out the Huffman compression in Battletoads. I had no idea the game used Huffman compression until I realized I was looking at a tree data structure. ("Wait, this array is indexing into itself? And sometimes into another array? ...Oh, I get it, it's a tree!") When I understood, I didn't know whether to feel relieved (since I knew what was going on) or run away screaming (since I knew what was going on)! It turned out that an extraction script wasn't too difficult to write from that point, though; I basically just wrote a script that mimicked what the ASM code did. I'm glad I was already familiar with the principles behind Huffman, though, or I'd probably have just gone mad.

I was also amused to find that, a couple of months later, rainwarrior independently ripped the text from Battletoads, not knowing I'd already done it.

LostTemplar

One SNES game I'm working is using - no joke - LZW compression. Pretty much the same as used in the GIF file format (I wonder if that wasn't a patent infringement). Fortunately I already knew LZW, otherwise it might have been a lot harder to understand and reverse engineer. It's by the way one of the reasons the game is pretty slow during some operations (like loading) - not a very fast algorithm on the SNES.

Dwedit

I've written APLIB for the Z80 before.  It works okay, but it's much faster on ARM, because conditional instructions are awesome.
"We are merely sprites that dance at the beck and call of our button-pressing overlord."