[PS1] Checksums of individual files? (Solved)

Started by acediez, September 02, 2020, 11:21:24 AM

Previous topic - Next topic

acediez

(Continuing on from my last thread, I made a new one because it's a different issue altogether)

I'm currently working on a PS1 game with a file that can't be modified. If I change even a single byte, the game softlocks on the loading screen where it's being read.
It's not that I'm modifying sensitive data: I've already figured out the file contents, and I know for sure that what I'm modifying is plain text data. If I make the same change directly to RAM instead of the BIN, the changes in text show up just fine.
It's not that I'm changing the file's length or position: this happens even when I replace a single byte on the middle of the file.
It's not EDC/ECC data: I have the disc's EDC/ECC data recalculated after any change I make. Either way, I'm working on no$psx, which AFAIK ignores EDC/ECC data entirely anyway. Though I'm getting the same issue with other emulators too.

Then there's this:

At the start of each of these files, there's 8 bytes of data I haven't been able to figure out what they're used for. They're ignored by pointers tables to filesizes and file start offsets, and they're not loaded to RAM. Because of this, I also can't (or don't know how) "breakpoint" on it and trace what is it being used for.
However, I noticed that if I change it, I get the same softlock on loading screens I get if I change a byte within the actual contents of the file. This led me to believe those first 8 bytes might be used as a checksum for the rest of the file whenever the file is being loaded to RAM. So I would need to either figure out how to update this value after making changes, or bypass this hypothetical file integrity check entirely.

I'm not entirely sure if that's what's going on, so I came to ask for anyone who might've had a similar experience to give me some pointers.
Has anyone had a similar problem and has been able to solve it? If so, what was the cause?
If it is indeed a checksum, how would I go about reverse engineering it? Is that even feasible to do? Are there any commonly used checksum algorithms I should check?
What about skipping integrity checks altogether?

For now the only thing I can think of is to read and compare what's going on in code during the softlock. Weirdly, if I revert the modified byte to its original value (any byte within the file that I might have modified) while the game is running, the game immediately gets unstuck and the loading screen completes, so I guess it's reading the same place over and over again until it gets the value it's expecting. Though to get a complete comparison of both cases (stuck/file modified, unstuck/file unmodified) I'm afraid I'll have to look line by line for a long time. I could really use a complete log of the CPU code while this happens, but this game doesn't run on PCSX Agemo, and no$psx doesn't seem to have an option to log CPU activity to a text file  :(

Anyway, thanks in advance for any leads!

PS: for additional context, I'm currently working on Goemon Shin Sedai Shuumei, which has most of the game assets packed (though uncompressed) within a single "ALLDATA.X" file. I've already managed to split this file into individual files, and repack it back to its original form, matching the original 100%.
This file is stored in Mode2/Form2. I got a pretty good understanding of sector headers by now. So far I've kept them identical to the original file, so that can't be the issue here.

Everything

Can you post some more examples of the 8 byte header? Maybe there is some pattern that would provide a clue. In your sample I notice that byte 0 and byte 4 are 2's complement of each other, and byte 1 and byte 5 are 1's complement of each other, but that might just be a coincidence. Also, byte 7 is the same as byte 2 with the bit order reversed.

I think your idea that the header is some sort of a checksum or hash value for the data sounds pretty reasonable. You might try calculating some common checksums and hashes for the data to see if anything matches up with the header. For a checksum, you can try either summing or xor'ing everything together, treating the data as an array of either 8-bit, 16-bit, or 32-bit values. CRC-32 is a pretty common hash algorithm, and the PSX would probably have been capable of calculating it on the fly for small files. Other hash algorithms like SHA-1 and CRC-256 would be possible, but many of them would have been more difficult to calculate on the fly with the 32-bit processor in the PSX.

There's also the possibility of trying to reverse engineer the algorithm by disassembling the machine code, but it could be hard to find.

acediez

#2
Alright, that's a very good lead already!

I did try, very hastily, to submit one of the files as hex data to a couple of checksum calculators online before coming here, but didn't get a match. I might've missed something though, since checksum and hashes is all new territory to me. I'll go read more about it.

Here's 10 examples of 10 consecutive files. I uploaded the complete files here too, which include these first 8 bytes.

0045: 3BAF A6B0 C550 BA65
0046: 841A B4FE 7CE5 55A8
0047: 8B8E 3185 7571 8D82
0048: 76C6 F3CB 8A39 6375
0049: 3B5C 95D9 C5A3 44D7
0050: 055E E68A FBA1 FA2F
0051: B80E 28FD 48F1 C0F8
0052: DC8B 4C56 2474 ECFB
0053: 0958 7E28 F7A7 B525
0054: 3931 71A2 C7CE 374A
0055: 6B93 799F 956C 199D

There's 215 files in total, most of which are in the same format, so there's plenty of samples besides these.
I'd really appreciate it if you could give it a look. Either way, you've already given me some useful leads to keep researching, thanks!


Edit: I said in the first post that this supposed checksum wasn't loaded in the RAM... But if that was the case, how could it be used to verify data? I looked it up again more carefuly, and sure enough, it IS in the RAM, it was just stored somewhere else. And by setting up breakpoints on this location, I found the subroutine responsible for the checksum verification! I haven't looked it up in detail yet, but I think I got all I need to reverse engineer the process, and eventually be able to apply a recalculated valid checksum to my modified files.

I'll still gladly take any leads or advice on dealing with checksums, specially if the above examples look like they might be part of a standard checksum algorithm (that would save me a lot of work).
But I can take my time now. At the very least, now I'm able to bypass this verification by forcing a match on the subroutine, and continue with the rest of my project.

Everything

#3
It definitely looks like the first 16-bit value and the third 16-bit value are some kind of additive checksum because they all sum to 0x10000. I'll play with the data and see if I can figure anything else out.

Edit: Yep, it looks like the first 16-bit value is the sum of all the *even* 16-bit values in the file. For example, for the data in the original post, 0xAF3B = 0xAF3B + 0x50C5 + 0x0112 + 0x044C + 0x0000 + ... The third value is the inverse additive checksum: 0x50C5 = 0x10000 - 0xAF3B.

The 2nd and 4th values might be some function operating on all of the odd 16-bit values...

acediez

Alright, I got it! I found both the subroutine that calculates the checksum, and the one that validates it. You gave me enough clues to follow along the assembly code and replicate the process to calculate valid checksums for my modified files. The checksum if made of:

[Sum of all 32bit values (4 bytes)] [Inverse sum of EVEN 16bit values (2 bytes)] [Sum of ODD 16bit values (2 bytes)]

Marked as solved! Thank you so much Everything!