News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Translations: Table File Standard, Generic Dumpers, and more!  (Read 22967 times)

Nightcrawler

  • Hero Member
  • *****
  • Posts: 5757
    • View Profile
    • Nightcrawler's Translation Corporation
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #20 on: August 10, 2010, 11:55:46 am »
Thanks for the feedback. It would be great to have you on board with it if it will ever see use beyond my personal stuff. The 'official' syntax is '\r' and '\n' The other were typos. Will fix. Thanks. Speaking of this, I don't think we need '\t'. Actual tabs should be able to be used freely in the table entries themselves, so it seems unnecessary.

Some thoughts on the table standard:

  • I agree with the deprecated table entry types such as line break, bookmarks, etc
  • I like the notion of having a table entry that switches tables automatically, but I don't yet understand how many cases your implementation will cover.
  • I don't believe that kanji arrays have a place in a table standard.  I only know of 2-3 games that use this.
  • There are a few spots in the doc where you use /r and /n instead of \r and \n.

I left the table switch part a little open ended because I believe the details of that should be left to the utility to cover the broadest amount of cases. From the table format's perspective, I felt that it only needed to know that there is a table switch, not necessarily the details on how it's used. Standardizing how it's used might lead to standardization of dumpers/inserters? I'm not sure how much the table file itself should dictate on this.

One example I had in there suggested that it will switch to the next table and then stay there until it hits the switch again. However, how long the next table is used before switching back is really undefined because it probably differs game to game.  Some switch for just the next byte. Some switch until the switch is seen again. Some switch until the end of the string. Some switch on other conditions.

Now that I think about it, perhaps a similar syntax to the Kanji Array feature might work where it's specified how long to be in the next table before switching back. 1 character, multiple characters, or infinite until next switch is encountered. Just throwing it out there. The details might still be better left out.

I thought it best only to define a switch is there and leave the implementation details out of the table file.


I generally agree with you on the Kanji array, but hear me out. First, it was considered only because I considered possible compatibility with all previously used utility table features (Romjuice had this one).  I proposed including this because it fits as a feature in the abstraction model we have. We don't have to examine any individual characters or get into game specifics.  It's a feature that aids only in obtaining the correct table value. I thought it could also represent an example of how we might add features to the table format to make it grow and support more while retaining our abstraction level.

Thoughts now that you've heard explanations?

Quote
On the dumper:

It'll be interesting to see what your approach amounts to.  I'm positive it'll be more user-friendly than Atlas (and probably the rest of the inserters/dumpers out there), which is a great thing for people starting out or dealing with simpler games...most people like to play with a gadget out of a box before reading an instruction manual.  I do believe that the approach is going to disappoint when it comes to games more complex than "Here's a relatively neat pointer table and/or here's a nice, neat bank of text.".  But maybe it's been because I've seen too many of Gideon's scripts and too many PSX scripts and know they require game-specific context, which admittedly, Atlas relies on the dumper (or user) for solving.

My development version is already more advanced than every other available dumper I know of.  I'd post more screenshots, but I keep changing the interface and revamping some of my design every other week. It supports things like multi-file dumping/inserting, pascal strings, full fixed length string/line support, and enhanced most features from Cartographer. Atlas compatible output has not been added yet. I'm still deciding how far I will go with that. Different situations will use different appropriate commands. Basic stuff will use autowrite, debug mode will use individual string writes, fixed length and pascal have their own deal, etc.

Disappointment will be true if you expect features with unreliable results such as Kajitani-Eizan would like, or abilities that better belong in a different utility such as pointer list generation. I'm all for expanding its capabilities in a 'smarter'/reliable manner. What I mean by that is some of the possibilities I mentioned a few posts up that might aid in extracting text or parsing more complicated scripting blocks. Anything that produces definable, reliable results.

Native Insertion (other than Atlas compatible output) is shaping up to be done via the same interface. Basically, if you load up the same config file you dumped with (or entered all the same information) the utility should be able to insert anything it dumped. Insertion is still in it's infancy though, but that's the way it's shaping up.

At the end of the day, it's never going to be a be-all-end-all, nor was it ever intended to be. Some people will be disappointed and not talk to me. Some people will bake me cookies. I started it as a project for myself so I didn't need to bother with several variations on custom dumpers for some common situations  that none of the currently available tools did the job right with. I also though it would be a good opportunity to promote some utility compatibility and standardization. It just so happens others are interested in sharing in the results and I think I can give them something better (in some areas) than what we have, if nothing else. :)

With that said, I will certainly welcome any ideas or thoughts you have. I will think about them and consider them. I can't promise I will use them all, and I can't promise you won't be disappointed! ;D
TransCorp - Over 20 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Glory of Heracles IV SFC/SNES Translations

Nightcrawler

  • Hero Member
  • *****
  • Posts: 5757
    • View Profile
    • Nightcrawler's Translation Corporation
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #21 on: August 10, 2010, 11:59:24 am »
I've also had this problem where the text chunk is in order but the pointers are out of order. Dumping from the table gives a very confusing script sometimes. Not to mention that you can have duplicate text chunks. It would be a great option to have.

Maybe I'm just not understanding what he was saying. What you just said there is something I would consider. Re-ordering the whole thing by text location rather than pointer order. Checking for duplicates could be done as well. That stuff can be be done with no scanning at all.
TransCorp - Over 20 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Glory of Heracles IV SFC/SNES Translations

Kajitani-Eizan

  • Hero Member
  • *****
  • Posts: 547
  • You couldn't kill me if I tried to let you
    • View Profile
    • Kajitani-Eizan's Patch Site
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #22 on: August 10, 2010, 02:43:56 pm »
That's not tangible. That's just a guess from an algorithm. The very fact that you have manually go back through them attests to that.

i think i see what you're saying, but i cannot remotely agree with it. you're saying that you wish for your program output to be "perfect". but that requires that the user put excessive effort into making sure the program input is "perfect". for example, you want to be sure that every pointer your program attempts to dump is in fact a pointer to valid text, but this requires that the user spend excessive effort either building many many tiny pointer tables, or building a pointer list, or writing their own utility that has basically the same functionality as your program but supports this sort of pointer searching to build a pointer list.

i don't see what the point of that is when the point is to make a program that makes things easier for the user. your program gains "perfection" (which i don't think is important to begin with), but at the cost of the whole point of doing it in the first place. romhacking by its nature is ugly; it was never supposed to be a process that's 100% "perfect" 100% of the way.

Quote
If you have hard coded pointers in the assembly code, you can locate them in seconds with a debugger and add to a pointer list. Or if you know the instructions, you can search via instruction pattern.

the former is not practical when dealing with large numbers of pointers/strings. the latter is not reliable in the general case at all... isn't your issue with brute force and pointer searching one of reliability?

Quote
But this is a dumper/inserter, not a scanning or pattern utility for possible pointers.

those go hand in hand. as stated above, a scanning utility for pointers will do much the same stuff a dumper would do... it seems odd to separate the two into separate programs.

Quote
In a raw blind dump, you don't know where the start of a string is. You're guessing based on where an end token is or other unreliable means.

often you'll have nothing between strings other than nulls. in that case, that makes start-of-string detection pretty reliable. 100% reliable, in fact. and if you don't, well... the raw blind dump itself would not be "perfect" then, as the user would have to manually go back and filter out the chaff from the wheat. that is to say, the raw blind dump (your program output) would not be 'tangible'.

and in fact, if the only other bytes embedded between strings are bytes with small values (out of range of the table) or pointers (which you can sense), it's still 100% reliable even with data in between strings.

Quote
or stick to your own tools that do this job so well for you

my, no need to be snippy

Maybe I'm just not understanding what he was saying. What you just said there is something I would consider. Re-ordering the whole thing by text location rather than pointer order. Checking for duplicates could be done as well. That stuff can be be done with no scanning at all.

what i was saying does not quite equate to this, though it can be similar. but yeah, this would be a cool feature.

Nightcrawler

  • Hero Member
  • *****
  • Posts: 5757
    • View Profile
    • Nightcrawler's Translation Corporation
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #23 on: August 10, 2010, 03:23:23 pm »
We're getting closer to understanding one another. There's hope, but still a gap remains!  ;D

i think i see what you're saying, but i cannot remotely agree with it. you're saying that you wish for your program output to be "perfect". but that requires that the user put excessive effort into making sure the program input is "perfect". for example, you want to be sure that every pointer your program attempts to dump is in fact a pointer to valid text, but this requires that the user spend excessive effort either building many many tiny pointer tables, or building a pointer list, or writing their own utility that has basically the same functionality as your program but supports this sort of pointer searching to build a pointer list.

That's not really what I'm trying to say. You want to work from unknown, I want to work from known. That's sort of the difference. It's probably best to discuss some specific cases to illustrate.

Quote
Ithe former is not practical when dealing with large numbers of pointers/strings. the latter is not reliable in the general case at all... isn't your issue with brute force and pointer searching one of reliability?

You caught me with the instruction pattern reliability. Indeed it is not. However, when the assembly code loads the pointer, catching it in the debugger is probably the only thing remotely practical. Let's say your pointer is called like this:

lda #$cb
ldx #$b654
jsr $4567

This is just one scenario on one platform. How are you going to scan and find any pointers in that fashion? With the opcodes separating it, it's not going to turn up in any pointer search you can possibly do unless you had wild cards to account for the unknown opcode. Snapping the debugger would catch them immediately in all cases with full certainty. Create a list, and you're done.

Quote
those go hand in hand. as stated above, a scanning utility for pointers will do much the same stuff a dumper would do... it seems odd to separate the two into separate programs.

That depends on the scan. A simple scan based on end of strings and a pointer format is reasonably related. But if you're going to blow it up into something smarter such as being able to parse scripting blocks via commands, work with instruction patterns etc., it becomes it's own utility in my opinion. It's much more than a side feature to add to a dumper.

Quote
often you'll have nothing between strings other than nulls. in that case, that makes start-of-string detection pretty reliable. 100% reliable, in fact. and if you don't, well... the raw blind dump itself would not be "perfect" then, as the user would have to manually go back and filter out the chaff from the wheat. that is to say, the raw blind dump (your program output) would not be 'tangible'.

and in fact, if the only other bytes embedded between strings are bytes with small values (out of range of the table) or pointers (which you can sense), it's still 100% reliable even with data in between strings.

This is the most trivial of cases where the text would be so nice as to flow completely together after an end string. I'll give you that case. Any other case and your searching in vein because you don't know where the text really starts. It seems like a better idea to me to work with what you know so a utility can intelligently find text or parse blocks. Maybe an example case would better clear the issue between us. Can you make up an example?

To be useful the the raw blind dump mode is really for those one off times when you want to dump something without need of any pointers. Maybe a few one-off strings or menu items, or a blob of text or something you want to take a peek at in remotely more readable format.. You're right if you try to dump large chunks with it, it's probably not doing anything reliable or useful for that matter.
TransCorp - Over 20 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Glory of Heracles IV SFC/SNES Translations

DaMarsMan

  • Hero Member
  • *****
  • Posts: 1288
  • Bring DQV
    • View Profile
    • DQ Translations!
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #24 on: August 10, 2010, 03:51:40 pm »
You caught me with the instruction pattern reliability. Indeed it is not. However, when the assembly code loads the pointer, catching it in the debugger is probably the only thing remotely practical. Let's say your pointer is called like this:

lda #$cb
ldx #$b654
jsr $4567

This is just one scenario on one platform. How are you going to scan and find any pointers in that fashion? With the opcodes separating it, it's not going to turn up in any pointer search you can possibly do unless you had wild cards to account for the unknown opcode. Snapping the debugger would catch them immediately in all cases with full certainty. Create a list, and you're done.

In my opinion, situations like this are best handled by an emulation core. I've seen examples in byuu's source of doing things like this. You can set the initial CPU registers and then loop through a table with given values. Doing this, you could set a breakpoint at the beginning of the $4567 function and set back an instruction to dump the offset of the ldx. I'm not sure how possible that is.

I coded a utility called pointer grabber that's in the utilities section of this site. It's not perfect and needs to be customized but what it does is scan for pattern like "JSR $4567" and then dumps the 2 bytes preceding it. Obviously you get false positives with it but you can usually weed them out pretty easily. Lots of times games seem to have all these functions in the same clump so if you disassemble that bank, you'll be able to see them all right next to each other and go through by hand. It's painful, but may be a more accurate approach.


So, although that should probably be externally handled, here is at least what I'd prefer:

Option for dumping based on string offset order:
At the beginning of each string dump, store the offset into a table for lookup. Then, if the option is selected, you can organize and output by that later.

Option to remove duplicate strings:
If the offsets were stored in a table as stated above, you could simply group all pointers that pointed to the same offset. This way you can get a much cleaner script. I can see situations where you would want to NOT remove duplicates but I think it would be an awesome feature.

Kajitani-Eizan

  • Hero Member
  • *****
  • Posts: 547
  • You couldn't kill me if I tried to let you
    • View Profile
    • Kajitani-Eizan's Patch Site
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #25 on: August 10, 2010, 05:00:35 pm »
Let's say your pointer is called like this:

lda #$cb
ldx #$b654
jsr $4567

This is just one scenario on one platform. How are you going to scan and find any pointers in that fashion? With the opcodes separating it, it's not going to turn up in any pointer search you can possibly do unless you had wild cards to account for the unknown opcode. Snapping the debugger would catch them immediately in all cases with full certainty. Create a list, and you're done.

i am not sure what is going on there. is it 6502 asm? unfortunately i do not have experience with that. where is the pointer? regardless, unless there are pointer types i am grossly misunderstanding, if you can write it using #WRITE, you can search for it. for example, if b654 is the pointer in the example above, i don't see why you can't search for b654. you might end up getting more false positives than you'd like, since this isn't a four-byte pointer like i usually would work with and so there's less specificity, but you ought to be able to find it.

and if you would like a rough example of how data can be interspersed wherever (making my method useful):

Code: (ARM assembly) [Select]
...

ldr r0,=String1Pointer
ldr r0,[r0]
ldr r1,=SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1
ldr r2,=SomeOtherCrapDataAssociatedWithTheMenuOrWhatever2

...

pop r4-r7,r15   ; end of function

push r4-r7,r14 ; start of next function

...

ldr r0,=String2Pointer
ldr r0,[r0]

...

pop r4-r7,r15   ; end of function

push r4-r7,r14 ; start of next function

...

ldr r0,=String3Pointer
ldr r0,[r0]
ldr r1,=SomeOtherCrapDataAssociatedWithTheMenuOrWhatever3
ldr r1,[r1]

...

pop r4-r7,r15   ; end of function

; Data area

<String1Pointer>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever2>
<String2Pointer>
<String3Pointer>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever3>

probably a crappy example, but the point is that the data area will not have a set pattern you can exploit. you either brute force or pointer search it (the easy ways to do it) or you spend forever building a pointer list. (in this example with only 3 pointers, it's not bad, but if there are a lot then it becomes moderately to excessively tedious.) alternatively, instead of the data area being at the end of all the functions, there could be a small data area at the end of each function:

Code: [Select]
(Start Function 1)
...
(End Function 1)
<String1Pointer>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever2>

(Start Function 2)
...
(End Function 2)
<String2Pointer>

(Start Function 3)
...
(End Function 3)
<String3Pointer>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever3>

in this case, you either pointer search it (the easy way) or you spend forever building a pointer list.

Quote
That depends on the scan. A simple scan based on end of strings and a pointer format is reasonably related. But if you're going to blow it up into something smarter such as being able to parse scripting blocks via commands, work with instruction patterns etc., it becomes it's own utility in my opinion. It's much more than a side feature to add to a dumper.

yeah, i was thinking of how cartographer already lets you specify how the pointers are set up. so the pointer format information is already being input into the program. all you'd have to do is generate pointers, and then search for those values, instead of just reading pointers.

not sure what you mean on that last part. what commands and instruction patterns are you talking about? i thought we're just talking about dumping text (and its associated pointers) from a game.

Quote
This is the most trivial of cases where the text would be so nice as to flow completely together after an end string. I'll give you that case. Any other case and your searching in vein because you don't know where the text really starts. It seems like a better idea to me to work with what you know so a utility can intelligently find text or parse blocks. Maybe an example case would better clear the issue between us. Can you make up an example?

isn't that trivial case a common case? obviously if it's just way too complicated then this method wouldn't be usable for a general-purpose script ripper, e.g.:

Code: (rather tough to tell where strings start) [Select]
56 23 1C 08  (  S  W  D  )  L  o  n  g     S  w  o  r  d 00 84 23 1C 08  (  S  W  D  )  B  r  o  a  d     S  w  o  r  d 00

^-- a pointer to something
             ^-- some kind of identifier, maybe something points here too
                            ^-- the main string; there is a pointer to this somewhere

but if your script looks like this:

Code: (easy to tell where strings start) [Select]
L  o  n  g     S  w  o  r  d 00 00 00 00 00  B  r  o  a  d     S  w  o  r  d 00
^-- pointer pointing here
                                             ^-- pointer pointing here

then it's super easy to figure out where a string starts. once you know where it starts, and the user inputs information on the pointer format (e.g. endianness, bytewidth, offset, etc.), you know what the pointer pointing to it must look like. once you know that, you can search for it.
« Last Edit: August 10, 2010, 05:11:51 pm by Kajitani-Eizan »

Nightcrawler

  • Hero Member
  • *****
  • Posts: 5757
    • View Profile
    • Nightcrawler's Translation Corporation
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #26 on: August 10, 2010, 06:00:35 pm »
i am not sure what is going on there. is it 6502 asm? unfortunately i do not have experience with that. where is the pointer? regardless, unless there are pointer types i am grossly misunderstanding, if you can write it using #WRITE, you can search for it. for example, if b654 is the pointer in the example above, i don't see why you can't search for b654. you might end up getting more false positives than you'd like, since this isn't a four-byte pointer like i usually would work with and so there's less specificity, but you ought to be able to find it.

65816. No, the pointer is $cbb654. The pointer load is separated across multiple opcodes. The pointer bytes are separated by opcodes. Along these lines often times the bank portion of the pointer is stored completely separate from the lower word. You can't write it using a single #WRITE. Understand now? Things like this are very common on my platform of expertise in which the methods you've described aren't useful. So now let's talk about yours.

Quote
and if you would like a rough example of how data can be interspersed wherever (making my method useful):

Code: (ARM assembly) [Select]
...
<String1Pointer>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever2>
<String2Pointer>
<String3Pointer>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever3>

probably a crappy example, but the point is that the data area will not have a set pattern you can exploit. you either brute force or pointer search it (the easy ways to do it) or you spend forever building a pointer list. (in this example with only 3 pointers, it's not bad, but if there are a lot then it becomes moderately to excessively tedious.)

I understand how the pointer can be found here in the assembly code if you had a matching address. You have your full 4-byte pointer right there non-separated.  That's different from the case I described above. Easier. However, I do have further questions. Where do you start dumping? The assembly code? The data area? I'm assuming it's the data area. If it's the data area, where do you scan for a pointer match? The entire ROM?  You start finding strings in the data area. You get to the end of <String1Pointer>'s end terminator. Now you now mark the start of <SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>. Now what? your search fails. So you continue on searching for every location until you hit a match again at <String2Pointer>? If I understand this correctly, this is prone to horrible error. Your pointer match could find the correct instruction. It could find <SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1> that happened to have the same bytes. (4-bytes is probably unlikely but possible.) There could be two matches, one is the real, one is a fake.

And as I mentioned before, none of this even remotely works if the pointer isn't all together nice and neat.

Quote
isn't that trivial case a common case? obviously if it's just way too complicated then this method wouldn't be usable for a general-purpose script ripper, e.g.:

Not that common on my platforms of expertise. I find quite a bunch of separated pointers and short relative pointers. I suppose it's more common on GBA and later systems that always have neat absolute 32-byte values together like that. That's really the only case I can see this working for.
TransCorp - Over 20 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Glory of Heracles IV SFC/SNES Translations

Kajitani-Eizan

  • Hero Member
  • *****
  • Posts: 547
  • You couldn't kill me if I tried to let you
    • View Profile
    • Kajitani-Eizan's Patch Site
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #27 on: August 10, 2010, 07:11:32 pm »
65816. No, the pointer is $cbb654. The pointer load is separated across multiple opcodes. The pointer bytes are separated by opcodes. Along these lines often times the bank portion of the pointer is stored completely separate from the lower word. You can't write it using a single #WRITE. Understand now? Things like this are very common on my platform of expertise in which the methods you've described aren't useful.

? okay, so then how do you define the pointer in cartographer? i mean, can you use cartographer at all to rip from this pointer? if not, then this is outside the realm of our discussion, as the only way to handle it is to dump RAW and handle the repointing manually; you can't use pointer lists or whatever else to do it either. if cartographer can handle it, why wouldn't you be able use the same pointer definition to search for the pointer?

Quote
The data area? I'm assuming it's the data area. If it's the data area, where do you scan for a pointer match? The entire ROM?

do you mean for the "brute force" method or the "pointer search" method? for the former, you are already in the pointer area; there is nothing to scan. you would follow each potential pointer and dump the potential string it points to. for the latter, the data area itself is the area you want to scan. the dumping area is where the text is.

i'm not sure i understand the rest of your analysis... i think my example may not have been clear enough. that, or the ASM is confusing. quick clarification:

in the function:
ldr r0,=String1Pointer  <-- load r0 with the address of String1Pointer.
ldr r0,[r0] <-- load the data at r0 into r0. (so, load the absolute pointer to the string into r0)

in the data area:
<String1Pointer> - four bytes. absolute pointer to the string.

in the text/dumping area:
S  t  r  i  n  g  1 00    <-- the string being pointed to by String1Pointer.
S  t  r  i  n  g  2 00
S  t  r  i  n  g  3 00 , etc.

for the "brute force" method, you iterate through the data area and follow any pointers you find there to their strings, and dump the pointers and strings. it's basically exactly what cartographer does right now with dumping from a pointer table, except it shouldn't crash when it runs into an invalid pointer (e.g. <SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>); it just notes an error and continues to execute. for the "pointer search" method, you iterate through the strings in the text/dumping area and search in a specified area for pointers that point to these strings. the specified area should encompass the data area(s). then you can dump any pointers you find this way along with the strings. (i initially drew a comparison between this and RAW dump mode because you are dumping from the text, not the pointers.)

i hope the above clears it up. if not, please let me know.

Quote
And as I mentioned before, none of this even remotely works if the pointer isn't all together nice and neat.

a) i'm still not totally convinced of this (see above), and b) i don't think it's relevant even if doesn't work in that case. you would only use the option in cases where you want it/it works, right? unless you're saying that it would be useful only in rare cases, and thus not worth including... which i assume may be true of systems in your area of expertise, but is not true for other systems (e.g. GBA/NDS).

Quote
Not that common on my platforms of expertise. I find quite a bunch of separated pointers and short relative pointers. I suppose it's more common on GBA and later systems that always have neat absolute 32-byte values together like that. That's really the only case I can see this working for.

you can search for relative pointers as well. by separated pointers, do you mean where the load is split across two instructions or where the pointers are really far apart? if the former, see above. if the latter, pointer search excels in that very instance.
« Last Edit: August 10, 2010, 07:27:46 pm by Kajitani-Eizan »

Nightcrawler

  • Hero Member
  • *****
  • Posts: 5757
    • View Profile
    • Nightcrawler's Translation Corporation
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #28 on: August 11, 2010, 08:57:24 am »
? okay, so then how do you define the pointer in cartographer? i mean, can you use cartographer at all to rip from this pointer? if not, then this is outside the realm of our discussion, as the only way to handle it is to dump RAW and handle the repointing manually; you can't use pointer lists or whatever else to do it either. if cartographer can handle it, why wouldn't you be able use the same pointer definition to search for the pointer?

You can't define or dump it in Cartographer. You CAN add the pointer to a pointer list and dump that way. Writing the pointer back is more difficult, but Atlas can do it with some of it's more advanced commands separating the write of upper and lower bytes of the pointer to different spots.

This was to exemplify a case off the top of my head your method wouldn't work for. Your method has limited application to absolute pointers where all bytes that make up the pointer are stored together. I'd prefer something with a more widespread application.
 
Quote
i'm not sure i understand the rest of your analysis... i think my example may not have been clear enough. that, or the ASM is confusing. quick clarification:

in the function:
ldr r0,=String1Pointer  <-- load r0 with the address of String1Pointer.
ldr r0,[r0] <-- load the data at r0 into r0. (so, load the absolute pointer to the string into r0)

in the data area:
<String1Pointer> - four bytes. absolute pointer to the string.

in the text/dumping area:
S  t  r  i  n  g  1 00    <-- the string being pointed to by String1Pointer.
S  t  r  i  n  g  2 00
S  t  r  i  n  g  3 00 , etc.

for the "brute force" method, you iterate through the data area and follow any pointers you find there to their strings, and dump the pointers and strings. it's basically exactly what cartographer does right now with dumping from a pointer table, except it shouldn't crash when it runs into an invalid pointer (e.g. <SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>); it just notes an error and continues to execute. for the "pointer search" method, you iterate through the strings in the text/dumping area and search in a specified area for pointers that point to these strings. the specified area should encompass the data area(s). then you can dump any pointers you find this way along with the strings. (i initially drew a comparison between this and RAW dump mode because you are dumping from the text, not the pointers.)

i hope the above clears it up. if not, please let me know.

I think I am clear, but I still think it is extremely limited in application. Let's talk 'brute force'.

1. Ok, you start in the data area. You dump what's pointed to by 4-byte <String1Pointer>. Move to the next 'pointer' which is <SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>. The program won't  crash on an 'invalid' pointer. However, what if this data is 1 byte long then the real pointers continue after that. You were dumping by 4-byte long pointers. The entire thing is thrown completely off on the first invalid pointer onwards unless the pointers and data are 4-byte aligned.

Now 'pointer search' method.

2. I see a little more merit in this method. However, this goes back to the first paragraph where it's only going to work where the pointers are absolute and bytes stored together. I believe that to be a very limited application overall. Maybe it happens more often with GBA/NDS, but I don't see that as often on other platforms.

Quote
a) i'm still not totally convinced of this (see above), and b) i don't think it's relevant even if doesn't work in that case. you would only use the option in cases where you want it/it works, right? unless you're saying that it would be useful only in rare cases, and thus not worth including... which i assume may be true of systems in your area of expertise, but is not true for other systems (e.g. GBA/NDS).

Yes, that's what I'm saying. It's useful only in specific cases that aren't too common on some platforms, but more common on others. I'd rather work on something with more wide spread, flexible application.

Quote
you can search for relative pointers as well. by separated pointers, do you mean where the load is split across two instructions or where the pointers are really far apart? if the former, see above. if the latter, pointer search excels in that very instance.

You can't search for relative pointers when you don't know what they're relative to. Games I've seen that use this have a base value they use and calculate the actual pointer by manipulating a relative pointer and adding it to the base. What is the base? You wouldn't know without looking through the code. It may be as simple as the start of the pointers or text, it may be more complicated than that. What is the manipulation? It may be a shift operation, it may be more.

I meant the load AND the pointer bytes themselves are split. the load is across multiple instructions and the pointer bytes themselves are stored separately and combined by the instructions.

To reiterate, I see a very limited and specific application to your method. I'd much rather see a more program intelligent way to address these situations that would handle your case and some of the ones I have brought up. Off the top of my head, a pointer list generating program with some ability for patterns, wild cards, equations, and searching might be what the doctor ordered.  Think along the lines of relative searching programs, but applied to finding and handling complex mixes of pointers and strings. That's one way I envision most of this cases being able to be handled.

TransCorp - Over 20 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Glory of Heracles IV SFC/SNES Translations

Klarth

  • Sr. Member
  • ****
  • Posts: 484
    • View Profile
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #29 on: August 11, 2010, 11:28:30 am »
I think the community overall could benefit from a quality pointer scanning / pointer table finding feature.  I'll give my view and experience regarding this.

Nightcrawler's sample is very much the same as what Cless wrote a program for PS-EXE text dumping (many PS-EXE strings use hard coded pointers, which are split between multiple instructions).  It was basically an instruction interpreter and output an Atlas script with separate high and low word writes for 32 bit pointers.  I agree with Nightcrawler that this is an example of something that should be its own utility rather than a niche dumper feature.

However, I do believe a generic pointer scan feature should be included with a good dumper, even with the limitations discussed.  There are two basic steps in script dumping: script interpretation and character encoding.  Character encoding is the obvious step, but people tend to forget about the script interpretation part, which is the weak part of currently available generic dumpers.  Script interpretation involves various aspects like detecting end string values (to print out debug info in the script or for formatting), ignoring padding bytes for boundary aligned strings, or interpreting other various control codes necessary for dumping.  A pointer scanning utility definitely needs to be able to detect the address that the string begins at in order to validate a pointer as being meaningful.  At this point, such utility is a table library shy of becoming a text dumper.

The only issue is how feature filled a generic pointer scanner should be.  I think a simple formula (or just adding/subtracting a constants) would work well for most cases where pointers are not split.

Kajitani-Eizan

  • Hero Member
  • *****
  • Posts: 547
  • You couldn't kill me if I tried to let you
    • View Profile
    • Kajitani-Eizan's Patch Site
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #30 on: August 11, 2010, 11:38:56 am »
ah, i wrote this up before klarth's post... but i think most of it should still be fresh. apologies if i'm wrong about this... but even the PS-EXE case can be handled and be made much more convenient using pointer search, if there are a lot of pointers that use the same high word. you'd still have to do some shenanigans for when pointers change high words, and then hand-modify the atlas scripts to include the high word/low word thing, but it would still be useful to use pointer search. obviously a custom tool would be preferable, though.

You can't define or dump it in Cartographer. You CAN add the pointer to a pointer list and dump that way.

that doesn't make any sense. if you can put it in a pointer list, you can search for it. (or did you simply mean that cartographer doesn't currently support it, but your new program will?)

Quote
Your method has limited application to absolute pointers where all bytes that make up the pointer are stored together. I'd prefer something with a more widespread application.

???? this is not limited at all. this is a very, incredibly, widespread case. in fact i would be willing to bet money that this is a more common case than when you have multiple instructions being used together in such a way as to make this method excessively inconvenient (meaning, you might as well just use single pointer dumps) or impossible (would bet money that this is would be a laughably fringe case). at the very minimum, it is very much applicable to GBA and NDS hacking, which are fairly popular systems to romhack.

(and no, as i said before, it does not have to be absolute pointers -- relative pointers are fine too. though absolute is preferred, for increased specificity due to lack of starting at 0.)
 
Quote
I think I am clear, but I still think it is extremely limited in application. Let's talk 'brute force'.

1. Ok, you start in the data area. You dump what's pointed to by 4-byte <String1Pointer>. Move to the next 'pointer' which is <SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>. The program won't  crash on an 'invalid' pointer. However, what if this data is 1 byte long then the real pointers continue after that. You were dumping by 4-byte long pointers. The entire thing is thrown completely off on the first invalid pointer onwards unless the pointers and data are 4-byte aligned.

at least on GBA/NDS, it makes no sense to do this, as it is an incredible amount of extra work that no one would go through the effort to do, seeing as how there is no benefit whatsoever. pointers should always be word-aligned or maybe half-word aligned (for 2-byte relative pointers) depending on how the code is set up. is that sort of thing commonplace in 6502 or 65816? even so, you can still brute force it by checking for a pointer at every location instead of limiting yourself to being word-aligned.

Quote
Now 'pointer search' method.

2. I see a little more merit in this method. However, this goes back to the first paragraph where it's only going to work where the pointers are absolute

just to reiterate, no.

Quote
and bytes stored together

again just to reiterate, this is a very frequent occurrence. certainly frequent enough to merit inclusion.

Quote
but I don't see that as often on other platforms.

which other platforms? even without having any experience with anything but PC programming and GBA/NDS ASM i would again be willing to bet money that just about any 32-bit platform would find my method to be applicable, with varying degrees of specificity and convenience improvement vs. the alternatives. do you have real experience with many different platforms to be able to make this kind of statement...

Quote
Yes, that's what I'm saying. It's useful only in specific cases that aren't too common on some platforms, but more common on others. I'd rather work on something with more wide spread, flexible application.

... and more importantly, this kind of judgment? if not, what you're essentially saying is "i haven't seen much use for this on NES/SNES, so it's not worth including". which i find curious considering NES/SNES is not the only systems people are romhacking, and in fact there is ongoing discussion right now about how there should be more documentation for hacking newer systems.

Quote
You can't search for relative pointers when you don't know what they're relative to. Games I've seen that use this have a base value they use and calculate the actual pointer by manipulating a relative pointer and adding it to the base. What is the base?

this is irrelevant. you would need to know the base to use any method at all other than RAW dumping.

Quote
What is the manipulation? It may be a shift operation, it may be more.

as long as the least significant part of the pointer is simply added to the result (which is necessarily true if the string being pointed to can start on any byte/doesn't need to be word or halfword aligned, which i suspect forms the majority of cases), this is irrelevant.

Quote
I meant the load AND the pointer bytes themselves are split. the load is across multiple instructions and the pointer bytes themselves are stored separately and combined by the instructions.

once again irrelevant, in the example you gave, because it was basically a relative pointer with offset $cb0000. if each pointer uses a different initial instruction, then yes, it would not be convenient enough to be worth using (and coincidentally enough (not), as i pointed out above, pointer lists would also be useless, unless you're keeping track of that initial instruction for each individual entry... in which case it's not really a pointer list anymore). if a lot of pointers use $cb, however, it would be very useful.

Quote
I'd much rather see a more program intelligent way to address these situations that would handle your case and some of the ones I have brought up. Off the top of my head, a pointer list generating program with some ability for patterns, wild cards, equations, and searching might be what the doctor ordered.

implementing regex or some kind of complicated parser would be very powerful indeed. however, it would be probably be a pain to do -- both on your end and the user end. much simpler, both for yourself and the user, would be to simply include pointer search and brute force. (not that you have to actually DO anything to support brute force, other than not have it crash or halt execution when it hits an invalid pointer :P)
« Last Edit: August 11, 2010, 11:46:31 am by Kajitani-Eizan »

Nightcrawler

  • Hero Member
  • *****
  • Posts: 5757
    • View Profile
    • Nightcrawler's Translation Corporation
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #31 on: August 11, 2010, 02:05:44 pm »
Klarth:
I caught your post squeezed in there. I can see your logic. We'll see how it goes. No guarantees that my utility will serve these purposes.

Kajitani-Eizan:

that doesn't make any sense. if you can put it in a pointer list, you can search for it. (or did you simply mean that cartographer doesn't currently support it, but your new program will?)

You, as a human, physically add it to the list file. The list is loaded by the dumper and dumps them. This is how pointers that no program can currently detect can still easily be dumped. That doesn't make sense?

Quote
???? this is not limited at all. this is a very, incredibly, widespread case. in fact i would be willing to bet money that this is a more common case than when you have multiple instructions being used together in such a way as to make this method excessively inconvenient (meaning, you might as well just use single pointer dumps) or impossible (would bet money that this is would be a laughably fringe case). at the very minimum, it is very much applicable to GBA and NDS hacking, which are fairly popular systems to romhack.

You believe your method is not limited at all? I'm not going to argue with you. Just as sure as it is common on 32-bit platforms is just as sure as it's not so common on 8-bit and/or 16-bit platforms. It can't be by their own register vs. pointer size mis-matches. An 8-bit register can't hold a 16-bit pointer for example. The reason you see it so frequently on GBA is due to the way the ARM series (and many modern) processors work vs. pointer size. NDS is immaterial to me as I do not support hacking current generation systems. I prefer a solution applicable to old and new which ideas were already discussed. I am of course also biased to adding features that directly benefit my own projects since that's the primary reason for the tool's existence in the first place.

Quote
which other platforms? even without having any experience with anything but PC programming and GBA/NDS ASM i would again be willing to bet money that just about any 32-bit platform would find my method to be applicable, with varying degrees of specificity and convenience improvement vs. the alternatives. do you have real experience with many different platforms to be able to make this kind of statement...

Is this the e-peen battle question? :P Over the past 13 years I have done at least some degree of hacking on GB, SMS, GENESIS, NES, SNES, TG16, GBA and PC. I also know exactly what you're saying about 32-bit ARM CPUs and how they function with byte alignment. I am an electrical engineer and work with firmware and hardware design regularly. </end e-peen>

You want to discount all other platforms as less relevant and you're upset I don't hold enough weight for GBA, PC, and current generation systems.  Nothing we can do about that but respect one another's opinion on the matter. You've made your point. The feedback is appreciated. I will take it under advisement. :)
« Last Edit: August 11, 2010, 02:30:12 pm by Nightcrawler »
TransCorp - Over 20 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Glory of Heracles IV SFC/SNES Translations

Kajitani-Eizan

  • Hero Member
  • *****
  • Posts: 547
  • You couldn't kill me if I tried to let you
    • View Profile
    • Kajitani-Eizan's Patch Site
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #32 on: August 11, 2010, 09:52:48 pm »
Quote
You, as a human, physically add it to the list file. The list is loaded by the dumper and dumps them. This is how pointers that no program can currently detect can still easily be dumped. That doesn't make sense?

no, it doesn't. if you, as a human, can physically add it to the list file, a program can search for it and find it (probably). i think i have said this several times now, so i should probably do something different: can you explain how a human can compile a list of pointers that cannot be found by a program? what is the impediment to the program finding them? exactly what is the complicated analysis operation that the human is doing that the program cannot? what extra information does this "pointer list" that a human would compile contain besides the "main" part of the pointer itself, and why, if this extra information were provided to the program, would the program not be able to find the pointers in question?

Quote
The reason you see it so frequently on GBA is due to the way the ARM series (and many modern) processors work vs. pointer size.

okay, now we're getting somewhere. so you admit that it is something that is seen "frequently" and thus not "very limited" in scope.

Quote
NDS is immaterial to me as I do not support hacking current generation systems. I prefer a solution applicable to old and new which ideas were already discussed. I am of course also biased to adding features that directly benefit my own projects since that's the primary reason for the tool's existence in the first place.

oh. well, if you would just say this in the first place, it would lead to a lot less confusion. what you want to work on is up to you. however, when you claim something is "not tangible" and "limited" in scope when that's simply not true, it leads to lots of unnecessary discussion :P

i apologize if it seemed i was trying to initiate e-penis comparisons... i was just very confused as to how you could make sweeping statements about how limited in scope this method would be, when it is not. now that you have clarified that you are focusing only on older systems, essentially disqualifying most of the wide variety of systems that i was assuming would find this method useful, it makes more sense. however...

You want to discount all other platforms as less relevant and you're upset I don't hold enough weight for GBA, PC, and current generation systems.  Nothing we can do about that but respect one another's opinion on the matter. You've made your point. The feedback is appreciated. I will take it under advisement. :)

um, no. i am not discounting other platforms as less relevant. that is what you are doing, ironically. i am proposing a method that i think is very useful for newer platforms. including it would not harm support for older platforms in any way whatsoever; it would just make things easier for those who happen to be hacking those newer platforms. i'm not sure why a feature should be excluded solely on the basis of it not necessarily being useful for the older generation, nor on the basis that it is not "tangible" or "reliable". when you hack a game, not everything you do will be "perfect" in the way you seem to be insistent on. if you use slightly "imperfect" methods to achieve a clean and properly working end result, i don't see what the problem is. put another way, i don't see the value in pursuing such an... intangible... ideal.

anyway, i suppose i have said my part... whether you want to include it or not is up to you :P
« Last Edit: August 11, 2010, 10:15:04 pm by Kajitani-Eizan »

Tauwasser

  • Hero Member
  • *****
  • Posts: 1392
  • Fantabulous!!
    • View Profile
    • My blog
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #33 on: August 12, 2010, 06:49:30 am »
at least on GBA/NDS, it makes no sense to do this, as it is an incredible amount of extra work that no one would go through the effort to do, seeing as how there is no benefit whatsoever. pointers should always be word-aligned or maybe half-word aligned (for 2-byte relative pointers) depending on how the code is set up. is that sort of thing commonplace in 6502 or 65816? even so, you can still brute force it by checking for a pointer at every location instead of limiting yourself to being word-aligned.

While this is certainly true for a majority of games, some games do go the extra mile and load pointers byte-wise. Just look at the PokéMon franchise. I also saw it in some other Japanese ROMs, but alas, I forgot the name of that game. However, I think all pointers used by ASM routines will be aligned, since no compiler will split those up, I hope. However, for some pointer in some data, this doesn't hold.

cYa,

Tauwasser

Kajitani-Eizan

  • Hero Member
  • *****
  • Posts: 547
  • You couldn't kill me if I tried to let you
    • View Profile
    • Kajitani-Eizan's Patch Site
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #34 on: August 12, 2010, 07:50:27 am »
LOL

to be fair, i myself coded a routine that loaded pointers using half-words. but this was to allow you to stick pointers directly into/on top of double byte text strings, i.e. it was a rom hack. i'm pretty confused as to why a compiled, non-hacked game would need to do this, because (a) the only reason i can think of is if they too stuck pointers into text strings and (b) there's no good reason for them to do (a), i think.

Tauwasser

  • Hero Member
  • *****
  • Posts: 1392
  • Fantabulous!!
    • View Profile
    • My blog
Re: Translations: Table File Standard, Generic Dumpers, and more!
« Reply #35 on: August 12, 2010, 08:02:51 am »
Well, it's for in-game scripting routines. Possibly because they wanted to save space by using variable-width commands instead of words. So one command can be 6 bytes long, the next only 3 etc. Therefore pointers won't be aligned and need to be read byte-wise.

cYa,

Tauwasser