News: 11 March 2016 - Forum Rules

Author Topic: Can I cleanly pull this script out or am I pretty much screwed?  (Read 875 times)

Metallica93

  • Jr. Member
  • **
  • Posts: 4
    • View Profile
I have a Game Boy Color game that someone else did all of the work on table wise, but provided me enough insight so I could start working on things I've always wanted to do: script comparisons (i.e. U.S. vs. E.U. version), checking for lines I've never seen before, correcting the script for any errors, etc. I've learned a lot and it's fun, but I'll be damned if I hit a hell of a wall.

For some reason, the game doesn't use the same characters to indicate a line break to the second line of a text box or a jump to another text box altogether. Instead, line breaks can be WT, WV, WX, WY, etc. (e.g. "Where areWXyou going?") while text box breaks can be UWU, UWV, UWW, etc. (e.g. "This is theWYend for you!UWV" -> press A and cue the next text box).

This makes it extremely difficult to clean up because a sentence with "Where?" might be preceded by text box break "UWW", thus making it read as "UWWWhere?" I don't know how to edit the table to account for that.

I also can't seem to edit three characters at a time. Trying to replace "UWW" (0E, 10, 10) with something like (BR) gets read as "(BR)W".

Examples: https://i.imgur.com/2VHKG06.jpg

I hope this made sense as I'm still learning, but I definitely need assistance in trying to dump this script cleanly. Any and all advice would be appreciated!
« Last Edit: August 12, 2021, 09:59:43 pm by Metallica93 »

FAST6191

  • Hero Member
  • *****
  • Posts: 3238
    • View Profile
Re: Can I cleanly pull this script out or am I pretty much screwed?
« Reply #1 on: August 12, 2021, 07:15:57 am »
Interesting setup there. Wonder why the differences (new line, end of paragraph/box being the main two, though seeing "press A" plus other end of line type deals ,auto scroll to next box perhaps, then probably where that is at) but that we can leave for another day.
Just for the sake of clarity then when you say the breaks can be [blah] does the corresponding hex match up with the same characters that could plausibly be in the text? Or is that just what the table maker used as a stand in. If if it is a stand in replace with punctuation not seen in the game and that should solve an issue or two. If the game actually uses such things then bizarre but has been seen. 99% of the time such a thing will feature an escape sequence -- in your UWW example then E0 precedes 10 which a standard relative encoding should not have as U and W are near each other, if it is not relative then OK. If it is an escape character then you probably want a tool (or a hex replacement) to scan the file for E0 values and then do things accordingly to the next however many bytes.

If it is for comparison purposes and no need to reinsert whole scripts or change things radically you don't necessarily need to keep the data the different things there provide. To that end I would probably match the indicators with a normal text editor search (could probably do it without even get hands dirty with regex) and replace with suitable breaks.
Finding differences and replacing the odd word should then be doable enough without such things.

One of proper ways is more likely to be you use pointers. A game like this probably defies the traditional of older (and even newer -- I saw plenty still on the DS) consoles and might actually use its line break command to do new lines, as opposed to most older things that go for pointers to indicate end of line.

3 characters at a time. I am a bit hazy on the specifics of this one. Some editors and table tools might top out at 16 bits, but there are some 24 bit efforts (see Crystaltile2 for one I am pretty sure goes there, its search tools certainly do).

Metallica93

  • Jr. Member
  • **
  • Posts: 4
    • View Profile
Re: Can I cleanly pull this script out or am I pretty much screwed?
« Reply #2 on: August 12, 2021, 09:33:51 pm »
If I understood your first paragraph correctly, a hex value of 10 11 10 translates to "WXW" according to the table I was given, the "WX" being the line break while the second 'W' is the start of a sentence. See the very first circled text in the Imgur link.

The original plan was to just dump the scripts of both versions and use the Compare plug-in for Notepad++ for script comparisons, but the code doesn't match up exactly and Notepad++ didn't seem to like looking at ~7,300,000 characters twice. I did replace the most common bits of translated code with breaks, but it's still pretty messy. There are just so many unique strings that it would probably take me just as long to filter them out in a table as it would to finish looking at the script line by line :/

And pointers are something I'll have to look up, then. I thought the table I was given were pointers, honestly.

Here is the table, for reference, if you'd like to better judge what I was given/am working with: https://i.imgur.com/KuxYFbE.png

I kind of assumed everything would be close together (e.g. 01 = A, 02 = B, etc.), but it doesn't look that way.
« Last Edit: August 12, 2021, 10:22:49 pm by Metallica93 »

LostTemplar

  • Hero Member
  • *****
  • Posts: 910
    • View Profile
    • au-ro-ra.net
Re: Can I cleanly pull this script out or am I pretty much screwed?
« Reply #3 on: August 13, 2021, 08:55:34 am »
Looks to me like 10 is the actual control code and the one byte after that is an argument (table files support that, by the way). Same for the ones starting with U, but it seems to have two arguments.

To answer the question in the title: Yes, you can always pull a script out cleanly. Sometimes table files might not cut it, but I don't think that's the case here. Looking at the game code that decodes the strings would be the definite solution.

Metallica93

  • Jr. Member
  • **
  • Posts: 4
    • View Profile
Re: Can I cleanly pull this script out or am I pretty much screwed?
« Reply #4 on: August 13, 2021, 09:04:48 pm »
So 10 can be control code and 'W'? Or the function is predicated upon what is around it?

And I assume by "game code" you mean the binary data, but my limited knowledge of this means I don't know where to start (hence asking you kind folks).

All I've been able to really work out is this: "So long ago", which is the first line of text in the game, is preceded by "WW" when translated using the current table. It appears as "WWSo long ago". The binary data is 10 10 FC (and then the rest). Editing those two Ws breaks the game and doesn't add anything (e.g. replacing the second W with a '!' doesn't net me "!So long ago" or anything).

"WWWant" = 10 10 10 (and then the rest of the binary data for "ant"), as another example.

KingMike

  • Forum Moderator
  • Hero Member
  • *****
  • Posts: 7183
  • *sigh* A changed avatar. Big deal.
    • View Profile
Re: Can I cleanly pull this script out or am I pretty much screwed?
« Reply #5 on: August 13, 2021, 09:44:57 pm »
It looks like an initial 10 is a control code, then the next byte is the line length. Then the text follows.

In the case of the three "10"s in a row you posted, that is a control (probably event script code), then another 10 (16 in decimal) for the line length, then a third "10" for a "W" character as the first character in the string.
I assume the "0E" following one of the sample strings is the line break code.

Not something hard to write a custom dumper for (assuming you can figure out the rest of the control codes, and how to tell when a string starts and ends),
but I don't think that's something you're going to be able to simply with available tools.
Probably would need some programming skills to write your own.
"My watch says 30 chickens" Google, 2018

Metallica93

  • Jr. Member
  • **
  • Posts: 4
    • View Profile
Re: Can I cleanly pull this script out or am I pretty much screwed?
« Reply #6 on: August 13, 2021, 10:16:46 pm »
Didn't think of that when looking for patterns, so I appreciate the insight there.

No programming skills yet, sadly (PowerShell scripting, anyway?), but at least I'll only have to dig through a couple hundred thousand characters (that are mostly broken up with carriage returns, at least) instead of several million.

On the bright side, it seemed I was mostly doing it correctly from the start. Many thanks to everyone for their informative replies :)