News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Search for compressed graphics in rom.  (Read 1190 times)

Under_Nerd22

  • Jr. Member
  • **
  • Posts: 3
    • View Profile
Search for compressed graphics in rom.
« on: April 14, 2020, 05:23:08 am »
Please do not say in advance that this is not for beginners and stuff, I do this because I like it, thanks. In general, I learned to look for pointers, change text, change graphics (not compressed) and much more (even the assembler began to learn). And then I remembered that I came across rums of the final fantasy on WonderSwan color where 99% of the graphics are in a compressed state. I looked for what they write on the forums and read about the main compression algorithms (RLM, LZ77, etc.). But here is just the same my question: how to look for graphics in the code then? Not necessarily in wonderswan with any console would be interesting.

nesrocks

  • Hero Member
  • *****
  • Posts: 722
    • View Profile
    • nesrocks.com
Re: Search for compressed graphics in rom.
« Reply #1 on: April 14, 2020, 11:04:07 am »
Depends on the system and the debugging tools available. I can speak for NES though because that's what I'm familiar with.
You work backwards because you know how the hardware displays images. So you set breakpoints on writes to specific ppu memory addresses and analise the code on the debugger to understand where the data is coming from.

FAST6191

  • Hero Member
  • *****
  • Posts: 3183
    • View Profile
Re: Search for compressed graphics in rom.
« Reply #2 on: April 14, 2020, 01:03:26 pm »
Some consoles have compression algorithms at BIOS, firmware or maybe just SDK level. Implementing compression beyond the basic RLE is tedious so many times people will default to that, with the case of BIOS and firmware being specific calls you can latch onto/note when debugging.
Known compression also leads to fingerprints for it (indeed type 10, type 11 and type 40 on the GBA/DS being named for the first byte, and thing I first look at when pulling apart files along with where the magic stamp might land itself).

Speaking of known then very few devs will implement their own fully custom from the ground on up and even then it tends to be a minor variation on existing types (I never expect to see a regenerative compression, think PAR2 files, on a game so if you know something other than dictionary or self referencing then congrats on winning whatever the Nobel prize for computer algorithms is).
To that end it helps to know what the basic obvious implementations are
https://ece.uwaterloo.ca/~ece611/LempelZiv.pdf
Despite the name it does cover Huffman and others enough for the purposes of this discussion/learning the ropes. About the only thing it does not cover that you might see in games is tile map level compression (if you have a blank background you don't tend to have 30 same colour tiles in the VRAM, and may indeed reuse border images or matching characters) and pointer level compression (repeating words/phrases, think how you might encode 12 days of Christmas or there once was an old lady who swallowed a fly to minimise space required) or maybe dual width encoding or placeholders, most of which are obvious. It will also not technically cover some of the filtering methods. For instance the GBA and DS see a method where despite the hardware only working in 4 bits per pixel they can still store two colours, like you might find in a basic font, and have it expand out at runtime. Indeed this is so basic that it is actually often a simple tile viewer option. Said GBA and DS also have a filtering option where things are reduced to differences rather than raw numbers so if you have a thing counting up by 1 each time then you get a nice 111111... rather than 123456.... (or with 0s in there if we are doing byte level stuff like we probably should).

As mentioned in the post above we typically know how the hardware works and it kind of has to get into that state to be displayed. Working backwards from that is only marginally more tedious than working back from a simple read.
Also knowing how hardware works might limit your expectations -- you are never going to get one of the crazy compression test methods you see in contests where they are allowed fairly modest specs (but still several orders of magnitude more than most of the consoles we look at around here/PCs of the day) and a long (especially for game load times) decompression time. Even aspects of LZ and Huffman are a big ask for a lot of machines (see most takes on the 16 bit and older devices) so you tend to see even more basic things like Run Level Encoding and the like.
If there are known examples with source code to look at on the systems you know then while it might not be easily tweaked to be compatible* it will give an idea of what can be done.

*you can usually zero in on a tile format by random clicking and testing, I would hate to do that for something I can describe in a sentence for compression. Say for LZ then even something as basic as all the same flags and general approach, however the look back and length of read values split the difference differently (3 bits read, 5 bits look back distance vs 4-4 split) is not something you want to be fiddling to determine.

I speak about the GBA and DS a lot because I know them most of all, however it does also mean that while compression on the older stuff can be a proper head scratcher that takes a seasoned assembly knowing hacker to handle then I would say about 95% of cases on the GBA and DS can be done by someone with very basic tools. For instance there are plenty of tools that will scan for the known types, take simple log files from emulators (compression in BIOS means you get a nice list of what compression, where and for how much it is, as well as maybe being able to limit it to the part of the game you care about rather than scanning the whole thing -- log a session where you just fight the boss you care about editing and...), decompress them and allow you to compress them again all without knowing the first thing about the compression formats involved.