News: 11 March 2016 - Forum Rules

Author Topic: How to tell type of compression used?  (Read 5385 times)

justin3009

  • Hero Member
  • *****
  • Posts: 1664
  • Welp
    • View Profile
How to tell type of compression used?
« on: June 10, 2013, 08:24:15 pm »
I decided to go back onto a project I've been wanting to do for ages: Sailor Moon - Another Story.

I know the compression routine was cracked as FuSoYa did so and such.  I've contacted him already though the chances of getting a response are slim.  Either way, I want to figure out how you can tell which type of compression schemes are used.

I've located the routine in this game (pretty certain it is) and I'm rather curious on what a tell-tale sign is of what compression is used for games.

http://pastebin.com/m9GjMseR - Here's the gist of what I believe is the routine and such thrown in there.
'We have to find some way to incorporate the general civilians in the plot.'

'We'll kill off children in the Juuban district with an infection where they cough up blood and are found hanging themselves from cherry blossom trees.'

KaioShin

  • RHDN Supporter
  • Hero Member
  • *****
  • Posts: 5699
    • View Profile
    • The Romhacking Aerie
Re: How to tell type of compression used?
« Reply #1 on: June 11, 2013, 03:15:15 am »
Well you just need to know how compressions work, then they are pretty distinct. Is it reading a control byte that tells it to copy the same char multiple times? RLE. Is it traversing down a tree? Huffman. Is it working with a buffer of earlier printed text and backreferencing parts from there? LZwhatever (it really does not matter which).

You can always write a "dumb" decoder by just translating the decompression ASM routine instruction by instruction into your language of choice.
All my posts are merely personal opinions and not statements of fact, even if they are not explicitly prefixed by "In my opinion", "IMO", "I believe", or similar modifiers. By reading this disclaimer you agree to reply in spirit of these conditions.

LostTemplar

  • Hero Member
  • *****
  • Posts: 910
    • View Profile
    • au-ro-ra.net
Re: How to tell type of compression used?
« Reply #2 on: June 11, 2013, 04:00:12 am »
I only took a quick look, but it's very likely some LZSS variant. Why?

Code: [Select]
$80/9BDA C2 30       REP #$30
$80/9BDC A7 10       LDA [$10]         ; read first flag bits
$80/9BDE E6 10       INC $10
$80/9BE0 E6 10       INC $10
$80/9BE2 85 0A       STA $0A
$80/9BE4 A0 10 00    LDY #$0010        ; initially 16 flag bits left (the two bytes just read)

$80/9BE7 46 0A       LSR $0A           ; get next flag bit
$80/9BE9 88          DEY
$80/9BEA D0 0B       BNE $0B [$9BF7]   ; need to read new flag bits?
...
(read new flag bits if necessary)
...

$80/9BF7 90 0E       BCC $0E [$9C07]   ; was flag bit cleared or set?
...
(branches back to $9BE7)

It loads two bytes (16 bits) into $0a and each iteration checks whether the LSB was set or not. If it runs out of flag bits (all 16 bits were checked), it reads the next two bytes. How it reacts on each flag bit being set or not depends on the concrete algorithm.

This pattern almost always indicates an LZSS variant.

justin3009

  • Hero Member
  • *****
  • Posts: 1664
  • Welp
    • View Profile
Re: How to tell type of compression used?
« Reply #3 on: June 11, 2013, 02:03:37 pm »
I believe you're correct.  From what I've been reading, it definitely seems like a form of LZSS.  I'm scouring the internet right now to see if there's any form of utility that works with the variant this has but tried a couple and no avail yet sadly.

If nothing is found, I'll have to relearn some programming and figure out how to work everything again.  (Not really a WORST case scenario as it'll definitely be helpful in the long run).

Edit: That or force the game to load all graphics decompressed. (I originally had this in mind actually though I failed to make it work.  The game really doesn't have that much graphic wise).

Edit 2: Could someone point me in the right direction on learning how to write code for decompressing graphics?  I'm.. rather lost on where to begin.
« Last Edit: June 12, 2013, 02:29:18 pm by justin3009 »
'We have to find some way to incorporate the general civilians in the plot.'

'We'll kill off children in the Juuban district with an infection where they cough up blood and are found hanging themselves from cherry blossom trees.'

henke37

  • Hero Member
  • *****
  • Posts: 643
    • View Profile
Re: How to tell type of compression used?
« Reply #4 on: June 12, 2013, 02:36:49 pm »
There is nothing special about graphics when it comes to lossless compression.

justin3009

  • Hero Member
  • *****
  • Posts: 1664
  • Welp
    • View Profile
Re: How to tell type of compression used?
« Reply #5 on: June 12, 2013, 02:42:17 pm »
But considering I have no clue how to even begin writing code for decompression.. It's pretty much all 'special' to me for now.
'We have to find some way to incorporate the general civilians in the plot.'

'We'll kill off children in the Juuban district with an infection where they cough up blood and are found hanging themselves from cherry blossom trees.'

Bregalad

  • Hero Member
  • *****
  • Posts: 2763
    • View Profile
Re: How to tell type of compression used?
« Reply #6 on: June 12, 2013, 02:55:32 pm »
Quote
There is nothing special about graphics when it comes to lossless compression.
This is actually not true. The bidimentional nature of graphics can be heavily used for a better "prediction" of compressed data.

Pikachumanson

  • Hero Member
  • *****
  • Posts: 607
    • View Profile
Re: How to tell type of compression used?
« Reply #7 on: June 12, 2013, 03:24:33 pm »
But considering I have no clue how to even begin writing code for decompression.. It's pretty much all 'special' to me for now.

I'd like to learn how to do this as well. I would love to make something like Maxim's Phantasy Star decompressor.

justin3009

  • Hero Member
  • *****
  • Posts: 1664
  • Welp
    • View Profile
Re: How to tell type of compression used?
« Reply #8 on: June 12, 2013, 04:01:39 pm »
It's definitely well needed around the community to understand how decompression works and where to start.  So maybe this topic in general could be used for such a thing on various compression formats?
'We have to find some way to incorporate the general civilians in the plot.'

'We'll kill off children in the Juuban district with an infection where they cough up blood and are found hanging themselves from cherry blossom trees.'

KaioShin

  • RHDN Supporter
  • Hero Member
  • *****
  • Posts: 5699
    • View Profile
    • The Romhacking Aerie
Re: How to tell type of compression used?
« Reply #9 on: June 12, 2013, 04:05:59 pm »
You can always write a "dumb" decoder by just translating the decompression ASM routine instruction by instruction into your language of choice.

*Ahem*

That carries you two thirds of the way. Then you just need to write the data you get into an image format, with the right library that should be trivial.
All my posts are merely personal opinions and not statements of fact, even if they are not explicitly prefixed by "In my opinion", "IMO", "I believe", or similar modifiers. By reading this disclaimer you agree to reply in spirit of these conditions.

justin3009

  • Hero Member
  • *****
  • Posts: 1664
  • Welp
    • View Profile
Re: How to tell type of compression used?
« Reply #10 on: June 12, 2013, 04:23:12 pm »
Cripes :|  There seems to be a lot more in this than I thought.  Is there a specific document and such to look at to get a jump on it?
'We have to find some way to incorporate the general civilians in the plot.'

'We'll kill off children in the Juuban district with an infection where they cough up blood and are found hanging themselves from cherry blossom trees.'

LostTemplar

  • Hero Member
  • *****
  • Posts: 910
    • View Profile
    • au-ro-ra.net
Re: How to tell type of compression used?
« Reply #11 on: June 12, 2013, 05:01:34 pm »
Essentially what KaioShin said. It really isn't that hard; just mimic the assembly code 1:1 in e.g. low-level C/C++ and then iteratively try to recognize high-level constructs (functions, conditionals...). A typical LZSS decompressor is usually not longer than a screen page of high-level language code.

An example decompressor is for instance http://www.romhacking.net/utilities/819/. You could try to understand it with the help of Wikipedia or any other LZSS resource and apply it to your concrete game compression.

FAST6191

  • Hero Member
  • *****
  • Posts: 3400
    • View Profile
Re: How to tell type of compression used?
« Reply #12 on: June 12, 2013, 07:10:47 pm »
Those wishing to learn how compression works then I highly suggest https://ece.uwaterloo.ca/~ece611/LempelZiv.pdf (PDF link). Despite the name it does kind of cover huffman and the others at a speaking/high level. After that it is pretty much "so this method has 12 bytes between flags and a 10/6 split between distance to read back and length of read as opposed to the more common 9/7 one". The changes there will break any existing decompression tool but given I just said two of the big three changes (the third being the flag bytes are different) in a single sentence it is not usually that troubling if you already have a reference implementation.
A pretty nice example implementation (relatively clean and easy to follow code) of compressions (in this case the GBA/DS "BIOS" style ones and other common variations) is Cue's decompression tools-- http://www.romhacking.net/utilities/826/

I will give that things change a bit between high level programming (so much IF loop) and assembly (flags, checks and other nastiness) but if you understand the idea behind flags, splits and such like then it is no great bother.