I have what I think is a simple question, I'm just not too good on the math-y side of programming in order to figure this one out. I have a string compression function that reduces repeating strings down into a compressed form. Something like: "ABABAB" -> 3AB. The decompression is really simple and results in a perfectly expanded product. However, I'm having a bit of a difficult time getting the compression to... well... compress.
My first attempt was a naive approach of just starting with single repeating characters, then 2, then 3, then 4, etc. until the max amount of characters allowed. However, this still tends to result in unoptimized outcomes (given the capped nature of the compression). An example would be:
AB AB AB ABC ABC
My current code would see AB AB AB AB and list it as 4AB, then it'd see a C with nothing good after so it'd do 1C, and then it'd see the 1ABC. Resulting in a compressed string of 4AB1C1ABC. However, it's fairly obvious to us humans that the correct compression would be 3AB2ABC.
I was wondering if there was a better way of scanning the input to get a better compression.
Simply decompressing an existing string and recompressing it boosts me from 2KB to 6KB in one example.
And I kinda lied, it's not really strings, but bytes for a particular part of my translation hack. But the principle is the same.
Here's a quick pastebin paste of my current rough code: http://pastebin.com/JebzabSZ
Basically I check the single byte compression, since it works a bit differently than the rest. From there I check the varying amount of bytes that will be repeated, and keep counts for each, and simply grab the one that works the best at the end and use that. But as I mentioned, it's a naive approach and commonly leaves bad compression in parts.
Thoughts? Has anyone else had to deal with something like this?
It's not too important, since the game is flexible on file sizes. But I have a feeling there's a hard limit on the total amount of stuff in general, so it's best to keep everything small. There's some other optimizations I can do as well, which I haven't tried yet. But that's unrelated to the question I'm posing here.