News:

11 March 2016 - Forum Rules

Main Menu

AlgoTalk

Started by BRPXQZME, January 11, 2014, 09:01:43 PM

Previous topic - Next topic

BRPXQZME

I'll be the first to confess that I really don't know much about compression algorithms, but I came across this supposedly new form of entropy coding. There is also a companion article explaining the algorithm itself, but it's at least slightly beyond my current understanding. The author claims that one of the goals for this is to bring something like arithmetic coding with a performance level acceptable for "retro" CPUs (such that decoding no multiplication or division; only shifts, adds, and masks). It sounds very enticing, doesn't it?

So, seeing as to how some ROMh ackers (who work with these sorts of algorithms on these sorts of systems somewhat closely) prowl these forums, would this sort of algorithm be a help given your limitations (on, say, the SNES)? Or is there some hidden restriction that he has not quite anticipated? ... or would you have to take a lot of time out of your busy schedules to bother reading his code and test an implementation in the first place?

Because if it is worthwhile, I'd rather it come to your attentions sooner than later.

For bonus points: educate me on common compression techniques without blowing my fragile mind to bits and I will have an improved opinion of your didactic skills. Surely there is no greater prize than this! :laugh: (You do not have to do this, of course; it's just that beyond a very vague sense of how the most popular ones work, my eyes kind of glaze over when I try to get deeper into the details.)

For extra bonus derailment points: Once the above has been adequately explained, talk about algorithms that you like or new ones you find interesting, especially if they are related to the craft. I say this because this stuff is a bit too deep and slow-moving for IRC, but I think having an algorithm chat thread would produce some fodder for the documents section and leave a record of the questions that come up concerning the tougher points. Well, as long as it's okay with the administration. (You will not hurt my feelings if this is considered more appropriate for the Newcomer's Board).
we are in a horrible and deadly danger

KC

Interesting. I use a customized Huffman (I dubbed it MultiHuff) as a text compression in almost all of my project, so I was really curious how this compares to it. I ran a number of tests for that, using the code linked to in the blog post:

  • a 2.28mb file of ASCII text strings, seperated by a binary zero. It's all the story dialog from SRWJ
  • the japanese SRWJ rom, 16mb
  • an executable of my assembler, 256kb
So three pretty different cases, one file with a very limited range of input characters, a binary file that already contains a lot of compressed data, and an uncompressed executable code with some data.
The results were pretty interesting:

FSE compressed the SRWJ text to 1.35mb, while MultiHuff brought it down to 1.07mb.
The SRWJ rom was 13.8mb with FSE and 13.2mb with MultiHuff.
The executable was 209kb with FSE and 193kb with MultiHuff

So overall FSE won in neither of those three cases. On its own it's probably not very efficient, though still better than standard Huffman. It may however fare better in other categories I didn't test the actual speed, and MultiHuff may not be very cache efficient (see below).

Here's what MultiHuff actually does:
It creates one Huffman tree per output value. After a value is decoded, the tree associated with that value is then used to decode the next value. So the length and encoding of all values depends on the context, the previous character. This is particularly useful for text, where certain character pairs are FAR more likely than others (how often does 'z' appear after 'w' in english? Probably never. So that character doesn't even appear in the 'w' tree and doesn't steal any bits from other characters). The downside is of course that there are many more headers, and it keeps changing which one it reads from. In a small cache this may lead to many misses.

CUE

KC, yours method is the same as used in the Saturn game "Shining Force", and it is quite efficient with text files. I call it Multi Huffman Coder.

I have my own updated version, never released, with source code included: https://www.mediafire.com/?3yq23nmafmvsij7

Bregalad

#3
The proper name for this would be "2nd order Huffman", or something in the like.

As for an answer to the original post, multiplications and divisions are no problem for retro CPUs, even if they don't have mulitply and divide instructions. It'll just take more time to perform, but decompression is usually slow anyway so if you're going to need multiplications or divisions I don't see it as a valid reason to reject a particular algorithm.

The real difficulty for retro architectures is the limited RAM (even if you probably don't want a minute for the thing to unpack as well). Most algorithms were designed with zillon of RAM in mind.

As for being educated about compression, you should have a look at the compression related documents of RHDN, espcially this one. You can also have a look at the tool I wrote that tries many compressions algorithms, with source code for encoder decoder, and lots of doccumentation about algorithms that are used in both the readme and the source code.
Although I don't try 2nd order Huffman or any improvements on Huffman, in fact plain old vanilly Huffman is probably the most complex to decode of the set.

In my opinion, the small margin you'll gain by switching from Huffman to Arithmetic (or any in-between vrariant) it probably not as well considering you could gain much more by combining/replacing the compression by a non-entropy based approach. Usually I find dictionary based algorithm works much better in the general case, although it of course depends on the data !

KaioShin

Quote from: Bregalad on January 12, 2014, 04:35:35 PM
in fact plain old vanilly Huffman is probably the most complex to decode of the set.

The encoding is so super easy though, I find that much better to implement and understand than LZxx et al. Huffman is pure computer science beauty.

Since BRP asked for compressions we find interesting, one of my favourite is Burrows-Wheeler-Transform (BWT). It's not a compression by its own, but rather a preprocessing step, but it's also a real *light bulb* sort of algorithm. I just love the idea of it. It is completely useless for hacking stuff though, since it requires n^2 memory to process n bytes. It is a transformation that takes any random data block and sorts the bytes so that it is afterwards perfect for RLE compression. The data is basically completely sorted by letter "aaaaaaaaaabbbbbbbbbbbbccccdddddd....". And it can still be transformed back into the original! It is quite brilliant. But as I said, massive memory requirements.
All my posts are merely personal opinions and not statements of fact, even if they are not explicitly prefixed by "In my opinion", "IMO", "I believe", or similar modifiers. By reading this disclaimer you agree to reply in spirit of these conditions.

Nightcrawler

From an SNES era point of view, it's probably not of much value.

First, most games of that era nearly always have the script broken down into small chunks or areas. So, we're always talking very small amounts of data that we are compressing at a time. The limited RAM generally prohibits you from decompressing very large blocks anyway. So, I can't imagine the extra range breakdown over standard huffman is going to offer any significant gains with such small data amounts.

Second, often times speed of decompression can be critical if the game decompresses strings on the fly. Several games I know of don't even start decompressing until you press the button to talk to a specific person. It will then decompress a block of text and string count (or pointer offset count) through it to a specific string. In these types of games, you need it to be relatively fast or you'll notice the delays. Obviously other games decompress the block ahead of time and the speed of decompression is of much lesser concern.

Lastly, I already don't see very much gain in standard Huffman over simple dictionary, RLE, and LZ (or combination thereof). There is usually very little to gain for any added complexity. I imagine this is mostly due to again, the fact there we are talking about generally small blocks of data. It's only when you're talking larger blocks do you start to really see the real world gains of more efficient and complex compression algorithms. With small blocks, you really get diminished returns in my opinion.
TransCorp - Over 20 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Glory of Heracles IV SFC/SNES Translations

Gideon Zhi

Pretty much what Nightcrawler said. In addition, when speed is a factor an already-in-place implementation of Huffman is usually sufficient, and in my experience it rarely pays to implement anything more than DTE in a game except where space is a major, major issue and expansion is impossible.

BRPXQZME

Thanks, guys. Very informative, especially since I checked and compression will not be covered in my remaining coursework, nor have my professors always been very helpful on saying where an application of the material will be practical anyway :beer:
(but I did check the documents section; that is literally the first place I checked... nothin doin :-[)

In the spirit of sharing, I'm going to talk about something that's more related to game development than to hacking: a little data structure called the quadtree. A quadtree is a kind of tree where the branch nodes have four children (leaves, as usual, have none). As the helpful image from Wikipedia shows, you would generally subdivide a 2D space into four quadrants, then subdivide as necessary until all the nodes have as much detail as you want. Basically, the octree's hot 2D cousin.

This is not strictly an algorithm in itself, no, but it lends itself to tasks like spatial indexing. It works well when it wouldn't do to not index and a simple 2D grid would not be sufficient.

The peril of not indexing: A simple distance calculation between objects is not too expensive. For circles, you don't even need the strict distance; you can just compare the squared differences in coordinates with adjustments for radii. But it is more expensive if you go beyond representing everything as a circle, and when you have a lot of things to test... well, if you were to test every object against every other object, you would have to perform (n2 - n)/2 tests (it's the handshake problem). That said, if you know that's not going to be a problem, you probably don't want to add the complexity.

Why a grid might not work: A grid is good and well if you know the objects are going to be spaced reasonably uniformly. You can optimize out more expensive tests by first using the "manhattan distance" (bad name, everyone who's ever gotten a good look at a map of the island knows the blocks are longer one way; other names for it are incredibly awkward, though). The problem is that in video games, that's hardly a guarantee. What you tend to have is some empty spaces over here, and some crowded spaces over there. That said, if you know that's not going to be a problem either, then you probably don't want to add the complexity.

When you have items placed in a quadtree properly, however (and that is not trivial and is left as an exercise to the reader, but in collision detection what isn't?), you can simply use the spatial properties of the tree to partition out your collision detection. Objects only need to be tested against other objects related to their tree's node (all descendants, but only direct ancestors).

Barring any subtleties that may arise (this is mostly helpful with static frames, for instance; it doesn't take motion into account, so it won't help with the "bullet through paper" problem or anything), the main tradeoff is that this is definitely overhead (I've seen recommendations to rebuild the tree per frame; not sure that's strictly necessary, but haven't gotten to test that hypothesis yet). In a good number of games, this simply won't be a necessary optimization. Still, there's other neat stuff you can do with this data structure, not all of it relevant to games.
we are in a horrible and deadly danger

Bregalad

QuoteSeveral games I know of don't even start decompressing until you press the button to talk to a specific person. It will then decompress a block of text and string count (or pointer offset count) through it to a specific string. In these types of games, you need it to be relatively fast or you'll notice the delays. Obviously other games decompress the block ahead of time and the speed of decompression is of much lesser concern.
In fact this is almost the opposite I think.
Even if the text is printed at 1 symbol per frame (which is quick), you still have an ENTIERE frame to decode a single symbol, which means a very slow decompression will work just fine in this case.
An example where speed is critical is when encoding maps, if you decode rows/columns of it on the fly while scrolling. You have to access a row or column randomly, and then decompress one and send it to VRAM, all this during one frame. It definitely has to be fast.


QuoteFirst, most games of that era nearly always have the script broken down into small chunks or areas.
Yes, most compression related tools completely ignore this "large amount of small chunks" approach, which is why I wrote my own.
In fact Huffman is nicer than LZ when it comes to small chunks, you can have a single tree and decode many small chunks using the same tree. However, LZ based algorithm can also be tweaked to work with this "small chunk" approach, but they don't compress as well as large chunks (see my Compress Tools for full info).

FAST6191

An interesting link/project so thanks for that. Most of my interest in compression goes into the lossy stuff (mainly video related) with the lossless stuff more being so I can at least play along in these circles or back in the data recovery world (combined lossy and recovery methods are an interest of mine*). My usual link if you want a nice overview of the more mechanical side of things is https://ece.uwaterloo.ca/~ece611/LempelZiv.pdf (warning, PDF) which despite the name covers the other types quite nicely. Likewise with me not usually moving outside of the 32 bit consoles and before the GBA I am not so limited.
Most of information theory, like many things in maths with theory added on the end, seems to trend towards the crazy abstract that could far more easily be summed up in words/examples. Though it is necessary for when you want to interface it with other areas of maths and take it to really high levels so I am not inclined to say anything there.


*as an aside and related to that if you have not seen http://www.youtube.com/watch?v=ruEn7TE4YMM yet I would highly suggest it, it deals with SD cards and their internals (if nothing else it goes into a bit of depth with the onboard 8051 chips that many SD cards have).

BRPXQZME

I actually got a book on information theory as a birthday present when I was, like, 12-ish. It was... probably not a good idea.
we are in a horrible and deadly danger

Avicalendriya

#11
.

Pennywise

BRP, did you have a misplaced childhood?

I feel sorry for any kid who got that book for a birthday present. What happened to video games, comic books etc?

BRPXQZME

we are in a horrible and deadly danger

Nightcrawler

That's not how it works in these cases though. You're not decoding just a single symbol in many of these types of games. These games decompress the *entire* block first, *then* parse or reiterate through the entire block *again* to find the string they want. This can be followed by a copy, formatting, manipulation, and parsing of the string. Lastly they start dynamically drawing the characters. Somewhere in the mix, the text window is usually drawn too. It's a very slow (and inefficient) process that can easily extend several frames. It gets even worse if the game adds any non-text related tasks in there, which often happens if the game's 'text' blocks are actually scripting blocks, menus etc. The SNES is barely fast enough to dynamically render a handful of characters per frame even if there were no compression or processing. :P

Other games of course do not do this type of process and I would agree with you in such cases.
TransCorp - Over 20 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Glory of Heracles IV SFC/SNES Translations

Gemini

Quote from: Nightcrawler on January 13, 2014, 06:53:29 PMThat's not how it works in these cases though. You're not decoding just a single symbol in many of these types of games. These games decompress the *entire* block first, *then* parse or reiterate through the entire block *again* to find the string they want.
The block thing isn't necessarily true or imperative, it depends on your coding choice and algorithm. If you have a decompression state structure and a function to pull a single symbol from the compressed sequence, you can call this function once or twice a frame and retrieve only what you need (say, a regular char to parse and any following parameters), leaving the rest of the CPU completely free for other tasks. With FSE and Huffman it's possible to achieve this result very easily as you're already bound to have a decompression state somewhere in there.

Nightcrawler

That's true. I concur they could benefit from replacing the system as you describe if it were feasible to do so. Of course, that would probably be true of any more efficient algorithm and/or process than what's already in place. This is getting away from specific benefits of FSE though. I shan't veer off-topic any further as I am merely a spectator. Perhaps more motivated individuals will do some in-game comparisons so we have some more data to toss around. :)
TransCorp - Over 20 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Glory of Heracles IV SFC/SNES Translations