News: 11 March 2016 - Forum Rules

Author Topic: LZSS compressor needed  (Read 3369 times)

Raccoon Sam

  • Jr. Member
  • **
  • Posts: 59
  • Left is Right and Right is Wrong
    • View Profile
    • Twitter
LZSS compressor needed
« on: June 26, 2017, 09:00:24 am »
Please help, I'm willing to pay a few bucks.  :-\

The decompression method for Pocky & Rocky was spec'd some time ago and some pseudocode/disassembly got pushed to GitHub too.
I already wrote a decompressor because that's pretty easy and it works alright but I now need a compressor so I can do some insertion too instead of just extraction.
My Python/C skills just aren't enough for the job so I need your help.

The format is explained in the /vr/ thread much better (if you can read the github code, that gives a good idea too I guess) but here's a quick visualisation I made. The LZSS'd data in ROM is the top field and the decompressed data in RAM is the bottom field.


The format is straightforward and decompressing it is easy peasy, but I can't wrap my head around how to make a compressor. Especially the pseudo-RLE shown in the green compress word is giving me headache.

For convenience, I've attached a compressed file and its corresponding decompressed file.
https://drive.google.com/drive/folders/0B4uCvH_ooCHfZmpOU25JcUM3LTQ?usp=sharing

bonus info:
I'm having a wedding party thing the 19th of August and to keep the kids (well, any guest who likes video games I guess) busy, we've set up a SNES with an Everdrive on it. Because Pocky bears a striking resemblance to my wife and Rocky to our fat dog, we thought it would be fitting to have a simple graphics-/dialogue rom hack of Pocky & Rocky. I can decompress data but getting it back in is impossible for the time being so I thought you guys might be able to help.  :beer:

Zoinkity

  • Hero Member
  • *****
  • Posts: 565
    • View Profile
Re: LZSS compressor needed
« Reply #1 on: June 26, 2017, 12:07:27 pm »
Written in Python3.
You already have a decoder, but threw one in here to test the output.  Call encode() with a bytes-like object to return a compressed bytes object.  Added in some simple prediction; output might be a tad bit smaller (a few dozen bytes, tops).

Code: [Select]
#!/usr/bin/env python

def decode(data):
    if not data[0]:
        return data
    # Be lasy and just assemble bytes.
    sz = (data[2] << 8) | data[1]
    d = iter(data[3:])
    c = 1
    out = bytearray()
    while len(out) < sz:
        if c == 1:
            # Refill.
            c = 0x10000 | next(d) | (next(d) << 8)
        if c & 1:
            p = next(d) | (next(d) << 8)
            l = (p >> 11) + 3
            p &= 0x7FF
            p += 1
            for i in range(l):
                out.append(out[-p])
        else:
            out.append(next(d))
        c >>= 1
    return bytes(out)

def _search(data, pos, sz):
    ml = min(0x22, sz - pos)
    if ml < 3:
        return 0, 0
    mp = max(0, pos - 0x800)
    hitp, hitl = 0, 3
    if mp < pos:
        hl = data[mp:pos+hitl].find(data[pos:pos+hitl])
        while hl < (pos - mp):
            while (hitl < ml) and (data[pos + hitl] == data[mp + hl + hitl]):
                hitl += 1
            mp += hl
            hitp = mp
            if hitl == ml:
                return hitp, hitl
            mp += 1
            hitl += 1
            if mp >= pos:
                break
            hl = data[mp:pos+hitl].find(data[pos:pos+hitl])
    # If length less than 4, return miss.
    if hitl < 4:
        hitl = 1
    return hitp, hitl-1

def encode(data):
    """"""
    from struct import Struct
    HW = Struct("<H")

    cap = 0x22
    sz = len(data)
    out = bytearray(b'\x01')
    out.extend(HW.pack(sz))
    c, cmds = 0, 3
    pos, flag = 0, 1
    out.append(0)
    out.append(0)
    while pos < sz:
        hitp, hitl = _search(data, pos, sz)
        if hitl < 3:
            # Push a raw if copying isn't possible.
            out.append(data[pos])
            pos += 1
        else:
            tstp, tstl = _search(data, pos+1, sz)
            if (hitl + 1) < tstl:
                out.append(data[pos])
                pos += 1
                flag <<= 1
                if flag & 0x10000:
                    HW.pack_into(out, cmds, c)
                    c, flag = 0, 1
                    cmds = len(out)
                    out.append(0)
                    out.append(0)
                hitl = tstl
                hitp = tstp
            c |= flag
            e = pos - hitp - 1
            pos += hitl
            hitl -= 3
            e |= hitl << 11
            out.extend(HW.pack(e))
        # Advance the flag and refill if required.
        flag <<= 1
        if flag & 0x10000:
            HW.pack_into(out, cmds, c)
            c, flag = 0, 1
            cmds = len(out)
            out.append(0)
            out.append(0)
    # If no cmds in final word, del it.
    if flag == 1:
        del out[-2:]
    else:
        HW.pack_into(out, cmds, c)
    return bytes(out)

Simple test case:
Code: [Select]
with open("decompressed_data.bin", 'rb') as f:
        dec = f.read()
    o = encode(dec)
    # Decode it to check the file against the original.
    print(decode(o) == dec)
    with open("output.bin", 'wb') as f:
        f.write(o)

Raccoon Sam

  • Jr. Member
  • **
  • Posts: 59
  • Left is Right and Right is Wrong
    • View Profile
    • Twitter
Re: LZSS compressor needed
« Reply #2 on: June 26, 2017, 04:26:32 pm »
wow, that was insanely fast
Your code is accurate and elegant, thank you so much!

EDIT: you might want to submit this, if you feel like it. After all, this is the first real toolset for Pocky & Rocky.
« Last Edit: June 26, 2017, 04:33:21 pm by Raccoon Sam »