11 March 2016 - Forum Rules

Main Menu

Golden Sun text compression

Started by Griever, October 28, 2014, 02:39:15 PM

Previous topic - Next topic


Hello, I'm digging into different text compressions and found one in Golden Sun for GBA is quite interesting.
Sure, I know about Labmaster's doc and tool, but unpacking routine was just rewrited atop on asm code, and I'm trying to get the idea of compression.
Here's graph of 1 character decode. That's all pretty complicated but the part I don't understand is this one.
Hard to explain, but I'll try.
The function receives previously decoded char code. Each of this chars has some kind of path in recursive tree (which I wanted to discuss here), so actual source text bitstream just decides either to parse tree one round or emit char.
Before each tree there's codes of characters, which can be put after this previously decoded char. So one tree parse returns offset-back to appropriate character. Source text bitstream just decides if you need to parse one more time and increase this offset-back.
Now for path in tree: if you go right (1), you get out of parsing, if you get left(0), you go into recursive tree, so you'll have to meet more 1s to get out of parsing and so on. At the end of parsing you have to count how much 1's did you get  and that will be offset-back.
Source bitstream can also skip some bits from the beginning of this tree path, so you can set a root in any place of this path, thus changing result offset-back.

For me all this path in recursive tree storage is overcomplicated and unoptimal (not saying, I don't understand it at all). But I'm sure there should be some simple explanation how this scheme works.
Maybe it's modification of well known algorithms or someone has seen this kind of tree in previous?


Last time I checked Golden Sun used an LZSS derivative, while the second game used Huffman coding. I might be misremembering, though.


Quote from: mziab on October 28, 2014, 04:18:16 PM
Last time I checked Golden Sun used an LZSS derivative, while the second game used Huffman coding.
Yes, LZSS is used by Golden Sun for tileset compression. The algorithm was straight, so I had no problem with it.
This code is surely not a straightforward Huffman. Can anyone, please, take a look at graphs, I've pointed in top message and try to identify compression scheme?


Sorry for necroposting, but I've figured all out once I've constructed tree and tried to crawl in it by hands.
In general, iteration bitstream is a binary tree, serialized in pre-order traversal. Source stream is a plain path in that tree. Whole compression scheme is a Huffman with individual tree for each char.