Legend of Oasis/Story of thor 2 - compressed font

Started by paul_met, September 11, 2013, 01:26:43 PM

Previous topic - Next topic

paul_met

Greetings to all.
Many people probably know about this wonderful game Legend of Oasis or the Story of Thor 2, which was released on the console Sega Saturn. I've been wanting to translate this beautiful game in my native language. But the problem is that the fonts of dialogs somehow packed. I was able to find the exact location of the font. It is located in the file CHR.BIN from address 0x4A800.  I find it difficult to determine exactly where the package ends.Please help us to unpack this compressed font. Only this font limits the beginning.
Here is the uncompressed font, taken from the dump of video memory.
Meduza Team

paul_met

#1
No one is interested or can not cope?
If it helps, here's the archive files. There is a binary file with a font that has been extracted from the video memory. Also in the archive there is a palette file for clarity. Font format 4bpp linear + Row interleaved.
Link:http://rghost.ru/48882097
Meduza Team

CUE

I worked with this game some years ago to translate to spanish language using a modified font including all spanish chars, but I never found the font because I'm not a Saturn expert. :banghead:

Now I analized again the game and I was able to decode the data. :beer:
Also, I can encode the data using a fake compression to modify the graphics directly with any of the usual tools. :thumbsup:

But I have some questions: :huh:
- where did you get the offset 0x4A800? I need change some offsets to modify some graphics ("Game Over")
- the game uses a variable font width, where the data are stored?

Some images using a grey palette because I don't know where the real palette is stored and is not needed:










<--- The extended chars are not usables but all spanish chars are added  ;D



paul_met

Wow, great job! Can you share your tools for unpacking / packing of resources? I know where to get the palette.
At the expense of Offset "0x4A800", it can be calculated from the size for packed data. But I did not yet know. I figured it by eye. I found a palette for fonts and loaded it into a tile editor. With the right palette to locate the font was not hard. Unfortunately I do not know yet where the offset value of compressed images and font widths.
Meduza Team

CUE

I'm finishing a tiny tool to replace the 2 decoded fonts and check that the new CHR.BIN works.

paul_met

Meduza Team

CUE

Bad news: the game don't run with the modified file. Maybe the pointers are needed, same as a Saturn expert  :(

paul_met

Repacked font was larger in volume than the original? The remaining compressed data were displaced?
I still would like to look at the your tool.
Meduza Team

CUE

Quote from: paul_met on October 03, 2013, 04:45:07 PM
Repacked font was larger in volume than the original? The remaining compressed data were displaced?
I still would like to look at the your tool.
The file can be greater or less than the original, but the game crash. All data are OK, and I can decompress the data without problems, but the game don't accept the new file. I'll try another encoding to see if it works.

I can't help you If this is a pointer problem. A Sega Saturn expert is needed.

paul_met

And if you adjust the size of the new compressed data under the old - the game still will not start?
Meduza Team

CUE

The game only works with the same encoded size. Adding o removing a single encoded byte causes a crash. I can expand the file to make it bigger, adding zeros at end, and the game run fine, so the file can be expanded. The pointers to the data are needed.

Tomorow I upload all, and a brief explanation of the compression. It's a very simple method.

paul_met

Quote from: CUE on October 05, 2013, 06:14:16 AM
The game only works with the same encoded size. Adding o removing a single encoded byte causes a crash. I can expand the file to make it bigger, adding zeros at end, and the game run fine, so the file can be expanded. The pointers to the data are needed.
One man had promised me to share information about CPU registers of SH2. I think if I get this information, I can find the necessary poiters.
Yet I found pointers of dialogs text and labels and know the necessary offset.

Quote from: CUE on October 05, 2013, 06:14:16 AM
Tomorow I upload all, and a brief explanation of the compression. It's a very simple method.
Thank you very much.


Some progress. I calculated the pointers, who is responsible for the coordinate of the withdrawal of the font. In our case, it does not give anything special, but it's something. I would like to find more values ​​corresponding to the size of the font.
Watch


[close]
Meduza Team

CUE

#12
Quote from: paul_met on October 05, 2013, 07:06:59 AMYet I found pointers of dialogs text and labels and know the necessary offset.
Do you have all text codes? I remember various pause codes, center code, some XXNNNN codes, where NNNN is the offset to another message, ...


The compression starts @ CHR.BIN:0x043000, after a big Shift-JIS font not used, until end of file.

You can see various blocks, sector-aligned (0x800 bytes), padded with 0x00:
- 0x043000, 1 file, I have named as 100
- 0x046800, 1 file, I have named as 200
- 0x048000, 1 file, I have named as 300
- 0x049000, 1 file, I have named as 400
- 0x049800, 1 file, I have named as 500
- 0x04A800, 12 files, I have named as 600-611, the first two are the fonts

Each encoded file has a 2-bytes header with the encoded size (low-endian value). The first file has a bad size, so maybe is not needed.  Following the size value are the chunks. The 3 upper bits of the first byte of each chunk defines an "action":
(the numbers are in binary)

code 000
  - 000+00000 ..................... end of data (0x00, as usual)
  - 000+nnnnn ..................... copy next 'nnnnn' encoded bytes

code 001
  - 001+mmmmm nnnnnnnn ............ copy next 'mmmmmnnnnnnnn' encoded bytes

code 010
  - 010+0nnnn bbbbbbbb ............ repeat 'nnnn+4' times byte 'bbbbbbbb'
  - 010+1mmmm nnnnnnnn bbbbbbbb ... repeat 'mmmmnnnnnnnn'+4 times byte 'bbbbbbbb'

code 011 (previous code must be '1nn' or another '011')
  - 011+nnnnn ..................... copy 'nnnnn' more decoded bytes

code 1nn
  - 1nn+xxxxx yyyyyyyy ............ copy '1nn' decoded bytes from offset $-'xxxxxyyyyyyyy'


The tools: http://www.mediafire.com/?tgj90qss6srp5z5 (CHR.BIN is not included)
You are free to use, blah blah blah, ...

Run "dec.bat" to decode all files. A log file is created, "decode.log".

"font.exe" replaces the 2 encoded fonts by the 2 decoded fonts, but you need seach and modify the pointers.


paul_met

#13
Text dialog pointer has a simple, consisting of 2 bytes and presented in the form of addresses beginning of the line. A variety of labels are offset 0x2da000. So much for the file TH2.LOW. As for 0TH2.BIN - offset is 0x4000.


Thank you for the program, I will look into it and if there are questions, I'll accomplish your goal! ;)

October 07, 2013, 11:05:22 AM - (Auto Merged - Double Posts are not allowed before 7 days.)

How about of the real compression of extracted data (not fake)? And if possible, is it possible to adjust the size of the new compressed data to the size of the old?

October 08, 2013, 11:05:52 AM - (Auto Merged - Double Posts are not allowed before 7 days.)

I checked - fake compression does not work. I rebuilt the disk image that you can use the new CHR.BIN. But the game freezes after you select "new game". Then I changed the font number 2 to the one that was used in the Japanese version of the game. These fonts are different in size, but with the substituted font game worked perfect. Conclusion - in the case of the font replacement pointer is not needed, since the data is decompressed sequentially. We just need to compress the fonts correctly.
PS: By the way, a font number 1 can be ignored, as it is not used.


Spoiler


[close]


And another question - what is meant by the two bytes after the size of the compressed data? For example: "28 00"? Why are they the same for both fonts?
Meduza Team

CUE

Tha japanese font is smaller as the english font and seems to work.

You can add 1 byte:
- change 0x0004AB9C, from 0x17 to 0x18 (1 more byte)
- change 0x0004B5B2, from 0x79 to 0x78 (copy 0x18 more bytes instead 0x19)
- insert byte 0x61 before 0x0004B5B3 (final font) -> copy 1 more byte (0x18 before + 0x01 now = 0x19, same as the original font)
- remove 1 0x00 from the end of file
The font is the same, but now the game crash.

2800 -> 0010100000000000 binary -> code=001 value=0100000000000=0x800 -> copy next 0x800 decoded bytes

paul_met

So what do we do now? What are my options? Real compression applied?
Meduza Team

CUE

The decompression seems correct, is a very simple RLE+LZ variation, so the optimal compression, not "real", has the same problem. If you exceed one byte the game crash :(

You need to know how does the game, if there is a limit, if pointers are used, ...

paul_met

#17
You were right. If exceeded the size of the compressed data of the original, the game will not work. Although decompression is performed after all - I learned this by watching the video memory.
I think there is another way - to reduce the size of the font data by removing superfluous. Then compress the data. The size of the new compressed data is guaranteed to get less than the original size.

October 11, 2013, 12:07:34 AM - (Auto Merged - Double Posts are not allowed before 7 days.)

By the way, your tool will not decompress all the data in the CHR.BIN. At the beginning of the file is still compressed logo and title screen. Is it possible to make a tool to request the address from which we should start unpacking?
Meduza Team

CUE

I just decoded the final section, where the font is stored, looking for more fonts.
You can decode more sections simply adding: Decode(chr, offset, number); (number is used as file name)

Tomorrow I modify the tool to accept the offset as parameter.

paul_met

Quote from: CUE on October 11, 2013, 02:58:02 AM
I just decoded the final section, where the font is stored, looking for more fonts.
You can decode more sections simply adding: Decode(chr, offset, number); (number is used as file name)

Tomorrow I modify the tool to accept the offset as parameter.
All right.
Tell me, you do not want to try to write at least a simple data compressor? Unless of course you have the time for it.
I finally got a file with documentation of Saturn. I will study it. If you're interested, I can share the information.
Meduza Team