News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: (PSX - Aconcagua) Can't figure out how compressed text works  (Read 2689 times)

Valendian

  • Jr. Member
  • **
  • Posts: 41
    • View Profile
Re: (PSX - Aconcagua) Can't figure out how compressed text works
« Reply #20 on: February 19, 2018, 06:19:17 pm »
EI is the bitsize of the pointer and EJ is the same for copy size. N an F convert these to buffer size and lookahead size. The lookahead is ised during compression to find longer duplicates within the sliding window. P is also only used during compression. It is the minimum string length, less than this will be stored raw.

The decoder works like this.
Read a bit.
If its 1 then the next bytes are raw. so read 8 bit count and copy that many raw bytes from the packed input to the output. Every raw byte is also appended to the sliding window (buffer).

But if the bit was clear then read a pointer + size pair and copy from the window instead.

Your version is not bit packing so replace getbit with readbyte.

Note LZSS uses sliding windows. LZ just offsets from current output pointer. Watch that slight difference. As Gemeni noted all LZ decoders have a similar form. The different versions are just tweaks.

Gemini

  • Hero Member
  • *****
  • Posts: 1987
  • 時を越えよう、そして彼女の元に戻ろう
    • View Profile
    • Apple of Eden
Re: (PSX - Aconcagua) Can't figure out how compressed text works
« Reply #21 on: February 19, 2018, 09:07:53 pm »
What do you mean by different locations? As in, while decompressing it can write a byte at 0x0100, then next, instead of writing to 0x0101, it writes to 0x0150 or something like that? Because I'm pretty sure it writes them all consecutively, at least it looks that way, not sure how I would check whether there happens to be a byte that is written somewhere else.
In terms of code it means that it writes decompressed data to two buffers: destination and a ring buffer. The latter is also used as source for compressed pairs.
I am the lord, you all know my name, now. I got it all: cash, money, and fame.

730

  • Jr. Member
  • **
  • Posts: 43
    • View Profile
Re: (PSX - Aconcagua) Can't figure out how compressed text works
« Reply #22 on: February 20, 2018, 03:24:28 am »
I DID IT!!! mostly...
There's a couple of bugs like some bytes way at the bottom that are different from RAM


though maybe they were modified after loading, I'll have to see that later.
I think I should also figure out where the size is, not only because I might have "decompressed" some garbage, but also because the loop isn't stopping at EOF (that's probably just me doing something wrong, still would be nice to know the size); I couldn't find the value Vehek says is right after the PS-X EXE header, maybe I didn't try hard enough.

Anyways, here's a Pastebin with the current code, I'm gonna have to do some debugging tomorrow, and verify if it actually works with all the other files (I hope they don't have different parameters or anything or I'm gonna die).
When I fix that stuff and find out the info I need I suppose I can make the compressor.

The file I used to test this was the one starting at 0x57000 of PROGRAM.BIN, with the first 0x1000 bytes removed since it really starts at 0x57800 like Vehek said. I would upload it but I'm not sure if that's legal...? (let me know if I can upload it)

(also please any suggestions regarding the code, maybe there's some abstractions I'm not aware of and it can be made easier to understand)
« Last Edit: February 20, 2018, 03:32:01 am by 730 »
Discord: 730#8471

STARWIN

  • Sr. Member
  • ****
  • Posts: 438
    • View Profile
Re: (PSX - Aconcagua) Can't figure out how compressed text works
« Reply #23 on: February 20, 2018, 08:33:18 am »
(also please any suggestions regarding the code, maybe there's some abstractions I'm not aware of and it can be made easier to understand)

you can zero init the buffer with char buf[BUFFER_SIZE]={0};

if buf is declared at global level, it will be zero init even by itself and you don't have to pass the array parameter when calling push

i would always have the code block after if well defined with { } around, even single liners, and that second line with EOF may be broken btw

there is almost never a reason to use a short int instead of int

(you can declare i temporarily "for (int i=0;" in for instead of declaring i earlier, if you want)

if you want to reduce copying in push, you can have a globally declared "int start=0;" and use the following:

pull with =buf[(start+offset)%BUFFER_SIZE];

push with buf[start%BUFFER_SIZE]=c;start++;

(not tested)

(it would be good manners to set start to 0 when it grows big enough, but i doubt it matters here)
« Last Edit: February 20, 2018, 08:39:05 am by STARWIN »

Valendian

  • Jr. Member
  • **
  • Posts: 41
    • View Profile
Re: (PSX - Aconcagua) Can't figure out how compressed text works
« Reply #24 on: February 20, 2018, 12:09:24 pm »
you can zero init the buffer with char buf[BUFFER_SIZE]={0};

Not when the array is used more than once.

@730 if those errors only show up at the end then all you are missing is the end of file logic.
« Last Edit: February 20, 2018, 12:29:34 pm by Valendian »

730

  • Jr. Member
  • **
  • Posts: 43
    • View Profile
Re: (PSX - Aconcagua) Can't figure out how compressed text works
« Reply #25 on: February 24, 2018, 06:39:09 pm »
I rewrote the program in C++ (full with classes and modularity and all that stuff), and fixed the bug (besides from the EOF thing). This bug was due to how I handled the sliding window, moving it every time a byte was written instead of leaving it fixed while it was writing a sequence of bytes from an offset and sliding it after that was done, which made it impossible to retrieve bytes from the proper position when (offset + i) > 0xFFF, which overflowed so it should've gone to the beginning of the window to get these bytes (also some arithmetics on how to get the bytes in those situations).

Anyways here's the program. This is basically the first time I've ever done something (so) OOP outside of class so this code is probably garbage, I'd like to know what I can improve (some notes already in TODO in main.cpp).

One thing I still don't have clear is, what algorithm is this specifically? Is it a variant with a name or is it just some made up LZ for Aconcagua? If it's something with a name it's gonna be easier to find out how to write the compressor, I've no idea how I'm gonna go about that yet.

edit: I think I figured out the compression algorithm, I'm gonna try coding it tomorrow. I forgot another thing that was really wrong in the code I made, I didn't destroy objects when I was supposed to, woops.
« Last Edit: February 24, 2018, 11:32:28 pm by 730 »
Discord: 730#8471

Valendian

  • Jr. Member
  • **
  • Posts: 41
    • View Profile
Re: (PSX - Aconcagua) Can't figure out how compressed text works
« Reply #26 on: February 25, 2018, 07:21:37 am »
You're using a std::vector<char> for the sliding window, I think this is creating a new problem rather than solving an existing one. The problem is made more simple by using a fixed sized c-style char array all you need to do is use modulus wrap around to enforce boundary constraints. Also the negative start index in the Sliding Window is dubious, I'm not sure if this is because you are using a vector but I really would consider dropping the std::vector and switching to a fixed sized c-style char array.

Code: [Select]
    // using a std::vector is creating more problems than it solves
    realOffset = ((offset + i) > 0xFFF ? (offset + i) - 0x1000 : offset + i);
    buf.push_back(buf[sldWindow->getStart() + realOffset]);

Code: [Select]
    // if you use a char[] it would look more like this
    buf[end++ % sizeof(buf)] = buf[(offset + i) % sizeof(buf)];

And this appears to be LZSS. The distinguishing feature of this LZ variant is that it maintains a separate sliding window. It treats the sliding window as a FIFO queue. Earlier variants simply took the position in the output stream and grabbed bytes from earlier positions, (the sliding window was a subset of the output stream and would contain duplicate entries).

Also when you get around to encoding you will need to layer some other technique on top to get satisfactory results, the basic example I pointed to earlier doesn't do anything fancy and so its compression ratio is rather poor. Huffman Coding can be used to improve this ratio.
« Last Edit: February 25, 2018, 07:34:31 am by Valendian »