News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: PSX data files, and editing text within them.  (Read 7553 times)

milkmanjack

  • Jr. Member
  • **
  • Posts: 10
    • View Profile
PSX data files, and editing text within them.
« on: May 29, 2012, 02:02:26 pm »
So recently I started messing with editing dialogue for Playstation 1 games, and I ran into an interesting problem I can't seem to really understand. I've tried quite a few games, and the problem is pretty much always there, so I assume it has to do with how the Playstation stores its data.

Basically, whenever I find some dialogue or text meant to be loaded in a file, there is always, ALWAYS, some totally unrelated set of bytes every few entries within the file. Sometimes predictable, sometimes not.

Here's an example of an English game, with the dialogue in ASCII.
Code: [Select]
Dymÿlos "Youÿ're not ÿusing yoÿur brainý,
It seems to be putting FF every 8 bytes or so within the dialogue, in this particular scenario, but then most of the dialogue I can even recognize as dialogue looks like this:

Code: [Select]
There a9・too many members in the party. Who will マleav>・
This is obviously a message indicating that there are too many members in the party, and asking you who will leave. Then you get things that seem like dialogue, and are garbled beyond legibility.

Code: [Select]
"I'm going home.  If you need my help・ c」・fiウPyrigl."汝gtS 挺約ee hangin';ouニqujメ・ィⅵh橇,゚ justセn?d
I have a similar problem with FF7 in Japanese. I've already mapped out all the kana characters, so I can grab bits of a conversation, but it seems to kind of switch modes to print kanji. So if a kanji is near, I stop being able to read the dialogue. It also is injecting those random bytes around it, as well.

Code: [Select]
ビッグス.「さ[FF]すが、ソルジャー[kanji coming up...]ミB)[kanji passed]でもよ、[kanji coming up].M.ゅ[FF][kanji passed](しんら)ゲルー[FF]プ 【アバランチ】[the rest of the dialogue is cut short, or just not in a readable form]
I really am not sure of where to go from here. I tried reading directly from the data files in the disks, and the results are the same. The text seems to be compressed or something. I've tried doing searches for related stuff, but I can't seem to find the right way to phrase it.

Klarth

  • Sr. Member
  • ****
  • Posts: 484
    • View Profile
Re: PSX data files, and editing text within them.
« Reply #1 on: May 29, 2012, 03:17:55 pm »
The first game is Tales of Destiny, which is certainly compressed using an LZ+RLE variant.  I think there are publicly released compression utilities for the Tales games.

So with LZ, you have 8 bits that describe the next 8 data, which can be a single uncompressed byte or a two byte pair that describes data previously encountered (this is where the compression savings come from).  In the top case with the FF byte, a 1 bit means an uncompressed literal where 0 is the two-byte pair.  After the data window (previously encountered data) starts filling up, it can reference back to that data.  This is why you can somewhat read LZ-compressed text at the start, but not much further in.

milkmanjack

  • Jr. Member
  • **
  • Posts: 10
    • View Profile
Re: PSX data files, and editing text within them.
« Reply #2 on: May 29, 2012, 05:33:19 pm »
What do you mean when you say it describes it? I understand how a 1 bit indicates that the next byte isn't compressed, but what exactly does it mean when it's 0?

Code: [Select]
4BE:3300h: 4E 65 75 65 73 74 61 64 FF 74 20 41 72 65 6E 61  Neuestadÿt Arena
4BE:3310h: 00 FF 43 68 61 6C 6C 65 6E 67 FF 65 72 73 27 20  .ÿChallengÿers' 
4BE:3320h: 57 61 69 FF 74 69 6E 67 20 52 6F 6F FF 6D 00 0C  Waiÿting Rooÿm..
4BE:3330h: 54 68 65 72 65 FB 20 61 55 A0 74 6F 6F 20 6D FF  Thereû aU too mÿ
4BE:3340h: 61 6E 79 20 6D 65 6D 62 FE 3F A0 20 69 6E 20 74  any membþ?  in t
4BE:3350h: 68 65 FF 20 70 61 72 74 79 2E 0A FF 57 68 6F 20  heÿ party..ÿWho 
4BE:3360h: 77 69 6C 6C 9F 20 6C 65 61 76 5A A0 71 A5 3F FF  willŸ leavZ q¥?ÿ

This seems to be the beginning of the string, and it is consistent for the most part, but I have no idea how to parse it after awhile...

Just not sure what to do with the data.

Klarth

  • Sr. Member
  • ****
  • Posts: 484
    • View Profile
Re: PSX data files, and editing text within them.
« Reply #3 on: May 29, 2012, 06:01:27 pm »
I don't have the source code readily available now, but I think there are posts about the specifics on this forum.  There are some descriptions about LZ algorithms if you google for them.  Popular variants are LZ77, LZ78, LZW, and LZSS.

So let's take a quick look at approximately how LZ works.  If your LZ bit flag byte is FE (ie. 1111 1110).  It would read 7 literal bytes from the stream (storing them into the so-called sliding window) and directly output to the decompressed stream.  Then a two byte pair is read.  Generally speaking, 12bits define where in the sliding window a "match" is found and 4bits define the length of the match.  12bits means that the sliding window is 4096 bytes large...so it can reference matches from a ways back.

So what does a match mean?  A match is a previously occurring string of at least 3 characters.

Code: [Select]
I got this awesome sword at sword mart!
                            ^
If the compressor is where the carat is, it will look at 4096 bytes of previous data (the sliding window) to find a long match.  In this scenario, there's a "sword " match beforehand.  These 6 bytes will be compressed down to a 2-byte location/length pair (and the overhead of 1 bit for the LZ bit flag).  The reverse operation happens in decompression.  You can think of it as a dynamically created dictionary compression.

milkmanjack

  • Jr. Member
  • **
  • Posts: 10
    • View Profile
Re: PSX data files, and editing text within them.
« Reply #4 on: May 29, 2012, 06:45:36 pm »
So just to be sure I'm understanding...

If we start compressing the string "I got a sword at the sword market," you'd get something like:

Code: [Select]
I got a [FF]sword at[F8] the [0095][FF] market
                     ^

Read the next 5 bytes [F8 = 11111000], then read the next 12 bits to get the location in the past 4096 bytes to look, and read the next 4 bits to determine how many bytes to read. In this case, start at byte 9 (I assume that's how it's formatted), then read the 5 bytes at that location.

Is that right?
« Last Edit: May 30, 2012, 08:29:23 am by milkmanjack »

Gemini

  • Hero Member
  • *****
  • Posts: 2016
  • 時を越えよう、そして彼女の元に戻ろう
    • View Profile
    • Apple of Eden
Re: PSX data files, and editing text within them.
« Reply #5 on: May 30, 2012, 09:40:20 am »
If want to make sure you understand how it works, check out this piece of source code used to decompress FFVII compressed data. LzssDecode is the function you're looking for.
I am the lord, you all know my name, now. I got it all: cash, money, and fame.

KingMike

  • Forum Moderator
  • Hero Member
  • *****
  • Posts: 6918
  • *sigh* A changed avatar. Big deal.
    • View Profile
Re: PSX data files, and editing text within them.
« Reply #6 on: May 30, 2012, 09:47:01 am »
Also, most LZ implementations will add 3 to the stored length value, because it is assumed you will only want to decompress a minimum length of 3 bytes. (since it takes 2 bytes to store the value, reading 1 byte wastes space and while 2 bytes would break even memory-wise, it's still a bit of waste of processing times)
"My watch says 30 chickens" Google, 2018

Klarth

  • Sr. Member
  • ****
  • Posts: 484
    • View Profile
Re: PSX data files, and editing text within them.
« Reply #7 on: May 30, 2012, 12:29:47 pm »
You're generally on the right track.  Here's something a bit more thoroughly worked up.

Lets say that we know the carat is where the only compression takes place.  But lets also say that the sliding window starts at $000 (they usually don't) and that there was $800 bytes of text before this (so the current position in the sliding window is $800 at the start of the string).  This means that the second "sword " starts at $81C and the first sword at  $813 (flag bytes don't get put into the sliding window).  The value is generally "how far behind" you have to look, so $81C - $813 = $009.  The length is 6, but you generally subtract 3 since it's assumed a match is at least 3 bytes.  So if the length is the upper 4 bits, then the LZ code is $3009.  (In some formats, the length code could come last.  ie. $0093)

Code: [Select]
I got this awesome sword at sword mart!
[$FF]I got th[$FF]is aweso[$FF]me sword[$F7] at [$3009]mar[$FF]t!

Individual strings won't necessary start with an LZ bit flag (the start of the compression block will though).  Past this, only reading about the algorithm or reading source code will further your knowledge.

Auryn

  • Hero Member
  • *****
  • Posts: 650
    • View Profile
Re: PSX data files, and editing text within them.
« Reply #8 on: June 01, 2012, 01:30:38 am »
You could take a look at this thread as well, probably it contains some infomations on ToD.

As for FF7, you want to look around here, the message board on it and naturaly here and especially the LZS archive site.
« Last Edit: June 01, 2012, 05:13:36 am by danke »