News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Different ways of storing text in a game file?  (Read 2769 times)

Ninto55

  • Jr. Member
  • **
  • Posts: 4
    • View Profile
Different ways of storing text in a game file?
« on: December 09, 2015, 06:46:39 pm »
Hi, I figured I'd post this in the newcomers section because I am new and this kind of thing probably gets asked all of the time.

I'm working on a translation of a game in my freetime, this is nothing serious, just something to work on for the experience. The game I'm working on is an old windows game from the late 90s, which is a port of a PC98 game. I've narrowed down the files where text is stored (They're named MSXXXX.BIN, the Xs being hexadecimal from 0000 to 7F07), but I'm having trouble actually figuring out how the text is in these files. I've found some English text on a computer in the game which is somewhere in the MS0017 file, but I cannot actually find it in the file. I was hoping it might just be in unicode or something (I had to enable Japanese unicode on my computer to get the game to display text properly, so I assume it would be unicode), but I can't find it. Relative search isn't turning up anything either (and I am taking note of capitalization).

So I'm asking here for help. What are common ways for a game like this to store text? Is there something I could be keeping an eye out for in my hex editor? Is there just something I hadn't considered?

As a side question, do you guys know of any good utilities for PC debugging? I have a PC debugger program but it doesn't work that well, and I've been using Process Monitor to keep track of when the game is reading files (this is how I found that the English text was in MS0017.BIN). Are there any other useful tools I could be using to make my life easier?

Thanks for any help, or just for your time in reading this.

Kujou_Karen

  • Jr. Member
  • **
  • Posts: 3
  • Hihi!
    • View Profile
Re: Different ways of storing text in a game file?
« Reply #1 on: December 10, 2015, 02:16:45 am »
Hihi! So, what encoding are you using? If you aren't seeing proper text, and if you had to do something special on your computer to get the text to appear correctly in game,, and you're already in Unicode, you may have to try Shift-JIS or EUC-JP to see the proper text.

As for debugging, sorry I can't help you. (;﹏;) Surely someone else will come a long with help for that!

magicalpatcher

  • Jr. Member
  • **
  • Posts: 34
    • View Profile
Re: Different ways of storing text in a game file?
« Reply #2 on: December 10, 2015, 02:21:12 am »
Shift-JIS was my first thought too. I'm not too familiar with hacking PC games, but it's also possible that the text is compressed, in which case relative search wouldn't work.

Could you post a link to one of the BIN files as an example?

mz

  • Sr. Member
  • ****
  • Posts: 447
  • Whore
    • View Profile
Re: Different ways of storing text in a game file?
« Reply #3 on: December 10, 2015, 02:29:37 am »
It can't be Shift-JIS or EUC-JP, or the relative search would have found the text.

As magicalpatcher said, you should upload MS0017.BIN somewhere and tell us the English text that we should find in it.
There has to be a better life.

Kujou_Karen

  • Jr. Member
  • **
  • Posts: 3
  • Hihi!
    • View Profile
Re: Different ways of storing text in a game file?
« Reply #4 on: December 10, 2015, 03:32:57 am »
It can't be Shift-JIS or EUC-JP, or the relative search would have found the text.

As magicalpatcher said, you should upload MS0017.BIN somewhere and tell us the English text that we should find in it.

Not if he hasn't tried using them yet.

FAST6191

  • Hero Member
  • *****
  • Posts: 3087
    • View Profile
Re: Different ways of storing text in a game file?
« Reply #5 on: December 10, 2015, 12:37:14 pm »
Samples would help.

Relative search might not do much. I have even heard tales (and seen tools) for 24bit characters, though I have no reason to suspect it happening here. Not to mention the euc0jp and shiftJIS include Roman characters beyond their usual "extension/subset of unicode" approach. http://www.rikai.com/library/kanjitables/kanji_codes.euc.shtml http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml

Another way is text pictures. Japanese visual novels and low text games being where I usually find examples of this.

Finding the text. What about altering the file and seeing what happens? It is well worth knowing the static analysis methods but if testing something out consists of double clicking something and seeing what happens in the game then do that. Increase a value by 1, do a random string, do a known repeating string, do a set string string counting up..... all things that will tell you lots if you think about what you are doing.

Debugging. The king of all PC debuggers is IDA https://www.hex-rays.com/products/ida/ and the older PC versions are free (newer non X86 devices less so). No ifs, no ands, no buts -- IDA is top here and nothing really even comes close for debugging random programs that you have not just written/compiled. Pick a hacking conference and you can be pretty certain that IDA will be the weapon of choice for most there.

However there are others, I do not suggest trying to wrangle the GCC's debugger or some similar dev tool debugger into working for hacking purposes. They are usually great for code but less useful otherwise.
http://www.ollydbg.de/ not even close to IDA but useful enough and certainly worth having in the same way that you might have HxD for when sitting on a machine that is not yours.
http://radare.org/r/ aka radare2. For my money has more functionality than ollydbg, to the point in some cases it might just be able to see IDA in the distance, but the UI (or general lack of it) gets in the way for a lot of people. It also does many things other than X86/X64 out of the box unlike free IDA.
I guess  can also link http://lldb.llvm.org/ though I would use it to flank the others at best in this case.

Likewise if you know the program is written in a scripting language (think Java though that can be messy) or a semi scripting language/lots of libraries language (C# and .net and that family for example) then you might be able to decompile it.

PC debugging in general is a world unto itself so I am not sure I want to even start to truly cover it here, and the oddities inherent to old Japanese games makes it even worse to me to contemplate -- they had some odd ideas about development and used odd formats. That said if you are working with Japanese games then look up the Susie graphics plugin format -- PC game *hacking* got reasonably big over there and many people made graphics decoders using the Susie plugin format.

mz

  • Sr. Member
  • ****
  • Posts: 447
  • Whore
    • View Profile
Re: Different ways of storing text in a game file?
« Reply #6 on: December 10, 2015, 01:02:06 pm »
Not if he hasn't tried using them yet.
The thing about relative search is that it doesn't care if it is Shift-JIS, EUC-JP, Unicode or whatever you throw at it; as long as those encodings have the English characters in the same standard order.

What could have happened is that Ninto55 didn't search correctly for 16-bit (or higher) characters. So, if he uses Monkey-Moore, he should try searching again with wildcards enabled: L*i*k*e* *t*h*a*t*.
There has to be a better life.

Ninto55

  • Jr. Member
  • **
  • Posts: 4
    • View Profile
Re: Different ways of storing text in a game file?
« Reply #7 on: December 10, 2015, 05:21:53 pm »
I don't think a different encoding would make a difference for relative searching if English is in the same order. And I tried to use the wildcodes before but didn't turn up anything, and I think I was using it right. And I wanted to try altering the hex values to see what happens, but I'm not sure which values to edit. I wanted to just increment every value, but I couldn't find an easy way to do that without going through and editing each value by hand. And obviously I don't want to mess with every value, just whichever ones have text, but I can't figure out which that is. I made this topic to try to get some common kinds of encryption I could try to check for, but I didn't word that right in my OP. Every document I came across learning how to do this says something like "And that is how you change a game's text! Assuming it isn't encrypted of course, then it gets much harder!" which isn't very helpful, but I understand they're trying to keep it beginner level. Anyway, instead I guess I can just pass this stuff off to you guys and see if you can find anything.

http://www.filedropper.com/ms0017 - MS0017.BIN file (What is a good file sharing site? The few times I need to upload a file I have trouble finding a good one, but this looks like it will do).

http://i.imgur.com/4sgcB8a.png - Image with four different text boxes (well five but the last two show up together and the bottom one is blank for the first three) with English text that should be in there. I've mostly been focusing on "Copyright" because it is a long, single word, more specifically I've been trying to find "opyright" so I don't have to worry about capitalization, but I haven't found anything with "Copyright" or anything else.

mz

  • Sr. Member
  • ****
  • Posts: 447
  • Whore
    • View Profile
Re: Different ways of storing text in a game file?
« Reply #8 on: December 10, 2015, 06:11:45 pm »
Well, if the reader could crack encryptions, he probably wouldn't be reading those tutorials in the first place.

Anyway, since the game has kanji, I'd say it's impossible that it uses 8-bit characters. It also seems to have a couple of different fonts for English, if there are more letters like that E in "Eメール".

When you're not sure where the text is in, put all the files in a single zip file without compression and do a relative search within that big zip file. If the text is really in this MS0017.BIN file, it looks encrypted or compressed to me.

What game is this?
There has to be a better life.

Ninto55

  • Jr. Member
  • **
  • Posts: 4
    • View Profile
Re: Different ways of storing text in a game file?
« Reply #9 on: December 10, 2015, 08:57:37 pm »
This is Giten Megami Tensei. And I know the text is in that file, the game read that file when I walk into the room (the hero's bedroom, it also gives you the option to save), and when I removed it I got a unique bit of text, it was all Japanese and I assume it was some kind of error thing, but it had two different speakers and it looked like some kind of dialog (which would be a weird way to display an error like that). I figure it is encrypted, how do hackers go about deencrypting things like this? What are some general methods or examples of people in the past?

mz

  • Sr. Member
  • ****
  • Posts: 447
  • Whore
    • View Profile
Re: Different ways of storing text in a game file?
« Reply #10 on: December 11, 2015, 01:51:48 am »
In a console you'd could just set a write breakpoint in one of the tiles with the text and then go backwards from there to figure out how it got the text from the ROM.

Sometimes the text is completely uncompressed or unencrypted in RAM, so you could set a breakpoint there too (or maybe see if it's a simple XOR or something.)

In any case, unless you're very experienced and can tell well known compressions and encryptions from just looking at a hex editor, you have to use a debugger and/or a disassembler. Then you just write your own decompressor/compressor or decrypter/encrypter in a programming language of your own choosing. :D
There has to be a better life.

magicalpatcher

  • Jr. Member
  • **
  • Posts: 34
    • View Profile
Re: Different ways of storing text in a game file?
« Reply #11 on: December 11, 2015, 01:59:14 am »
I took a look at the binary file and saw there were a lot of repeated values. For example, at 0x2B7E, the four bytes "7F 3F BE FE" are repeated 12 times. Most forms of compression shouldn't produce repeated values like this so I'm guessing the data probably isn't compressed.

Do you know approximately how many characters there should be in this file (i.e. can you write down all the dialog that should be in this binary file)? If the characters are encoded as 16-bit characters, then there should be about 10,000 characters in the file. If you find significantly more than 10,000 characters that correspond to this file, then it's almost certainly compressed and if you find significantly less, then either you forgot to account control codes or there is data in the file besides text.

Ninto55

  • Jr. Member
  • **
  • Posts: 4
    • View Profile
Re: Different ways of storing text in a game file?
« Reply #12 on: December 12, 2015, 03:49:25 pm »
I don't know how many characters are supposed to be in the file, all I know is the text for that room uses that file. There are probably control codes, and maybe other points in the game access the same file (it is the text for using a computer, I wouldn't be surprised if other computers in the game repeat some text). I'll try out some of those debuggers mentioned earlier and see what I can find, and if that doesn't work there is always blind trial and error until I change a value that does something.

Thanks for the help guys.