Romhacking.net

General Category => News Submissions => Topic started by: RHDNBot on July 09, 2010, 02:09:19 pm

Title: Translations: Table File Standard, Generic Dumpers, and more!
Post by: RHDNBot on July 09, 2010, 02:09:19 pm
(http://www.romhacking.net/newsimages/newsimage968a.png)

Update By: Nightcrawler

Nightcrawler has been working hard on some things that may be of interest to the Translation Community. The first of which is a document aiming to be: A reference to the file format, explanation for newcomers, and a helpful reference for programmers. A first draft has already been written. Suggestions and feedback are welcome in hopes that some may choose to adopt it. Adopting a standard would allow for standardized table file features and compatible utilities going forward. Additional details can be found on TransCorp.

The second item of interest is a yet to be named GUI Generic Dumper aiming to meet or exceed Cartographer (http://www.romhacking.net/utils/647/) in functionality for it's first release. It will be the first utility to follow the new table file format standard (when finalized). An option to output in Atlas 1.1 (http://www.romhacking.net/utils/224/) compatible format is expected. Complimentary native insertion functionality is planned, but the first release will focus on dumping functionality only.

In addition, Nightcrawler has been working on several other items. He's been tinkering with developing UVWF (SNES Universal VWF) to be applied to the sprite based text in Terranigma. He's been working on continuing the Tenshi No Uta Project. Lastly, he's been working with electrochip on experimenting with code that doesn't work on a home made flash cart, but does on copiers and BSNES to better understand bare bones initialization on the hardware for a potential learner's kit.

Plenty of additional information on all subjects can be found over at TransCorp!

Relevant Link: (http://transcorp.parodius.com)
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: tc on July 09, 2010, 02:27:40 pm
Oh wow! I always treated my press for a Terranigma VWF as a running joke. I had NO idea anyone really was listening to me. :laugh:
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Next Gen Cowboy on July 09, 2010, 03:15:56 pm
This is really something, considering he has been working on it for a long while now, with tons various complications, and than bam! Must have been hauling ass the last few weeks?
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: C_CliFF on July 10, 2010, 08:59:15 am
I'm really looking forward to this. Keep up the good work NC!  :thumbsup:

It's also nice to see that Mattias is still active in our small swedish RH community. I thought I was the only one left.
We talked about the VWF in Terranigma seven years ago so I'm looking forward to see this in his translation.

Say hello from me if you talk to him.  :)

-C_CliFF
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Nightcrawler on July 15, 2010, 01:59:29 pm
Only interest in Terranigma?

Nobody but Gil Galad and DaMarsMan are interested in the standard?  Wow! I must have a golden standard that will be adopted by all. Or... maybe it's so bad, nobody can even begin to talk about it and it will just be totally ignored. Ha! At least it will be over-glorified notes for myself then!  :laugh:
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: DarknessSavior on July 15, 2010, 02:02:40 pm
Only interest in Terranigma?

Nobody but Gil Galad and DaMarsMan are interested in the standard?  Wow! I must have a golden standard that will be adopted by all. Or... maybe it's so bad, nobody can even begin to talk about it and it will just be totally ignored. Ha! At least it will be over-glorified notes for myself then!  :laugh:
I'm more interested in your dumper/inserter than anything else. I haven't had any trouble working with tables in the past, regardless of the way the game handled text. Dumping and inserting text, depending on the game, can be a real pain, though.

~DS
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Gideon Zhi on July 15, 2010, 04:12:59 pm
Honestly, I'm almost past the point where I need a dumper. At this point, most of my scripts are at least translated if not inserted. The generic dumper could save some time in some cases since it dumps into Atlas format, but I'm at the point where the games it'll be useful on would almost require a custom job, or at the very least a hacky workaround (such as dumping from RAM and inserting to ROM, which no generic dumper is probably going to be able to handle.)
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Pennywise on July 15, 2010, 04:43:53 pm
I suppose I'm interested in it. For the most part, Cartographer gets the job done, but a GUI would be nice.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Nightcrawler on July 15, 2010, 05:27:16 pm
DS:
The dumper/inserter will use the established table standard, so they go together. You say you haven't had any trouble working with 'tables' in the past. What tables? The one you use with Windhex? The one you use with Thingy? The one with Hexposure? The one with Translhextion? The one you use with your own custom utility? The one with Atlas? That's the kind of thing I'd like to see go away moving forward with utilities. We can have some compatibility between utilities, set an abstraction layer for tables (more interesting for utility creators), and rely on standard feature set. Oh well. Our community likes to stay in the stone ages. Why should this be any different? They like their stones. :)

Gideon:
So, we've declared nothing here is of use to you? Fantastic! Wait... How is this contributing to the thread again? :P


Boy, I'll come back in 10 years and try to push you guys forward again. Timing seems to be way off on this one.  :-\
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: DarknessSavior on July 15, 2010, 05:49:00 pm
DS:
The dumper/inserter will use the established table standard, so they go together. You say you haven't had any trouble working with 'tables' in the past. What tables? The one you use with Windhex? The one you use with Thingy? The one with Hexposure? The one with Translhextion? The one you use with your own custom utility? The one with Atlas? That's the kind of thing I'd like to see go away moving forward with utilities. We can have some compatibility between utilities, set an abstraction layer for tables (more interesting for utility creators), and rely on standard feature set. Oh well. Our community likes to stay in the stone ages. Why should this be any different? They like their stones. :)

Gideon:
So, we've declared nothing here is of use to you? Fantastic! Wait... How is this contributing to the thread again? :P


Boy, I'll come back in 10 years and try to push you guys forward again. Timing seems to be way off on this one.  :-\
I've used just about every hex editor there is for romhacking, including Thingy, Thing32, Goldfinger, Windhex, Hexposure, and I'm sure I could dig up some others. And I've used the same type of table for each one, and never had an issue. I've used them to dump romjuice scripts (though, I admit I make a separate table with the "\r" and "\n" stuff) and insert Atlas ones with no problem.

If anything, I think perhaps there should be a standard for the basic <End> and <Linebreak> control codes. Some sorta standard value so that you don't have to set up the table to dump them properly.

But I'm more interested in the dumper/inserter because I don't really like romjuice, Cartographer is still unfinished (and I couldn't get it to dump ActRaiser 2, for some reason), and Atlas has always given me trouble. I'd like something a little more streamlined and simple to work with. Oh, and a GUI. I don't like stones. =P

~DS
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: tc on July 15, 2010, 05:55:11 pm
Closest thing to 'tables' I'd poked around in a few minutes, was an NES game typo. Turns out the corrected version was too long to fit and would've needed real hacking. :P
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Tater Bear on July 16, 2010, 12:40:42 am
Personally it sounds like a great idea. I am all for it, put an article on the site for newbies to use the new proposed standard and the new generation of romhacker can/will reap the rewards and benefits  :crazy:
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: C_CliFF on July 16, 2010, 12:21:54 pm
Only interest in Terranigma?

Actually, I was more interested in the generic dumper. The problems I've had in Cartographer has been about the base pointer function. How will this function work in your program? Will you be able to add and subtract values? (the Base Pointer)

-C_CliFF
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Nightcrawler on July 16, 2010, 09:36:38 pm
In the testing version right now, all pointers are added to the global offset. If the pointers are relative, you'd set the global offset to the pointer table start. If the global pointer is 0, the pointers are effectively absolute. Then, there's the option to use the pointer location as the base.

It certainly has not been finalized it yet. If you give me a few scenarios of what you've been dealing with and how Cartographer doesn't work, I can see if they fit into the system I have or if I can do something to handle them. I certainly have not thought of every pointer scenario and would be happy to take a look at as many as possible.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Kajitani-Eizan on August 06, 2010, 09:39:53 am
one problem i've had with cartographer is when i have a bunch of pointers (GBA; 4 byte absolute pointers with 0x08000000 offset to be subtracted) in an area, but they're not necessarily neatly arranged. they might have some other data in between them, not necessarily in any easily-defined pattern. i want to dump all of them. so i tried just having it dump every 4 bytes. but cartographer crashes if you try that, because instead of hitting an invalid pointer and going "oh, that's invalid, let me just output an error for that one and move on to the next", it goes "durr no error checking whatsoever, i are crash" :P

and another feature request: specify a pointer type and possibly a pointer range, specify the text block, and then dump from the text block but search for pointers as you go. so kind of like the RAW method, except you actually get pointers out of it. this is pretty much the method i've been using for my personal custom tools but it would be nice for it to be universally available.

depending on how you implement everything, this might be a pain, but one more feature request: same as above, except it sorts the text/#WRITE commands by location of the pointer. so if the text block is World <$00> Hello <$00>, and it finds a pointer to World at $200 and Hello at $100, the output is #W32($100) / Hello<END> / #W32($200) / World<END>, or somesuch.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Nightcrawler on August 06, 2010, 01:32:30 pm
one problem i've had with cartographer is when i have a bunch of pointers (GBA; 4 byte absolute pointers with 0x08000000 offset to be subtracted) in an area, but they're not necessarily neatly arranged. they might have some other data in between them, not necessarily in any easily-defined pattern. i want to dump all of them. so i tried just having it dump every 4 bytes. but cartographer crashes if you try that, because instead of hitting an invalid pointer and going "oh, that's invalid, let me just output an error for that one and move on to the next", it goes "durr no error checking whatsoever, i are crash" :P

I think you are asking for the same or similar thing DaMarsMan did on TransCorp:
http://transcorp.parodius.com/forum/?num=1273690996/4#4

I would support pointer lists for properly supporting these kinds of cases. The drawback is you would need to create the pointer list manually. It would probably be the responsibility of another utility to help automate the scanning or detection of  pointers and generate a list so you don't have to do it manually. I am not too interested in features that don't result in something tangible. Such 'scanning' functions will always generate false positives and misses.

I could also probably make your workaround of just dumping every 4 to work. The only issue I see is probably what crashes Cartographer. You get a pointer that exceeds the bounds of the file. I could handle that exception and output an empty string for something like that. However, it does make sense to generate an error upon an invalid pointer. I suppose an option to force/skip invalid pointers could accommodate both. I'll give it some some thought. I wouldn't imagine it comes up too often where you'd actually want to dump invalid pointers. At least it doesn't for me. Typically an invalid pointer indicates to me that I don't really understand what's going on and need to figure it out so it's not invalid anymore. ;)

Along these lines, I've always thought about a 'rule' based scanner that can find proper pointers based on the rules you feed it about the game's scripting blocks. It's never 'random' spacing between pointers. You just need to learn what the data is in between so you can scan the block and find the pointers using known information. I've done something similar for projects of mine. I parse the block with known rules and find valid text to dump.

With linked entries you can almost do that already if you know enough commands. If you set up your table fancy with line breaks after known linked entries/commands and their parameters, you can single out the actual text when it gets to it. Basically, you set up your table to 'parse' the data like:

Say you have this hex string:

$f2 $01 $03 $25 $37 $55 $45 valid text here $45 $62

In your table you'd have something like:

$f2=\n\r<create_window>,3
$37=\n\r<clear_window>,1
$45=\n\r<text_command>/r

You'd get an output of something like:

//<create_window><$01><$03><$25>

//<clear_window><$55>

//<text_command>
//valid text here

So, you see you can also pretty much parse the block that way too with a RAW dump.

Quote
and another feature request: specify a pointer type and possibly a pointer range, specify the text block, and then dump from the text block but search for pointers as you go. so kind of like the RAW method, except you actually get pointers out of it. this is pretty much the method i've been using for my personal custom tools but it would be nice for it to be universally available.

How does that differ from what you can accomplish with the pointer table range here?

http://transcorp.parodius.com/scratchpad/TextAngel.png

Not sure what you mean by searching for pointers.

Quote
depending on how you implement everything, this might be a pain, but one more feature request: same as above, except it sorts the text/#WRITE commands by location of the pointer. so if the text block is World <$00> Hello <$00>, and it finds a pointer to World at $200 and Hello at $100, the output is #W32($100) / Hello<END> / #W32($200) / World<END>, or somesuch.

I guess I'm not getting why you keep saying you're in RAW mode, but you're finding pointers somehow. If you're RAW mode, the only thing you can derive is knowing where an string end token is and assuming the next string starts right after that. Even then, you've collected all the information. What do you do with it then to 'find' pointers? I think this circles back to non tangible scanning algorithms.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Kajitani-Eizan on August 06, 2010, 07:05:50 pm
Along these lines, I've always thought about a 'rule' based scanner that can find proper pointers based on the rules you feed it about the game's scripting blocks. It's never 'random' spacing between pointers.

yes, it can be "random". especially if the pointers are simply embedded in the code in between functions, or if they just dumped a bunch of data somewhere that happens to include pointers as well as other stuff. that's why you need pointer searching/scanning, or brute force dumps. and you do get something tangible from those -- the output atlas script will have #WRITEs for each pointer it found. you can modify those at will to remove false positives and perhaps add in pointers that the routine might have missed. (though it really shouldn't be missing any, unless they fall outside the pointer search area you specify.)

Quote
How does that differ from what you can accomplish with the pointer table range here?

assuming that screen is showing functionality similar to what is currently in cartographer, it will go through a pointer table and dump the pointers and the text that the pointers point to. that is not the same as what i am saying. what i am saying is for it to go through the text area and find the pointers that point to the text, and dump those pointers and the text.

Quote
I guess I'm not getting why you keep saying you're in RAW mode, but you're finding pointers somehow. If you're RAW mode, the only thing you can derive is knowing where an string end token is and assuming the next string starts right after that. Even then, you've collected all the information. What do you do with it then to 'find' pointers? I think this circles back to non tangible scanning algorithms.

? what is a "tangible" scanning algorithm? here is the sort of algorithm i am proposing and have implemented in my (crappy) tools, and i know others have as well:

1. aha i've found the start of a string.
2. let me make a note of what offset this string starts at.
3. let me search a specified area for pointers that point to this offset.
4. ok, i found it. let me dump that pointer along with the string.
5. ok, next string. goto 2.
6. repeat until end of text block is reached.

if you're confused about how one might search for pointers... for example, the GBA addresses the ROM starting at 0x08000000. so, if my string is at 0x00001000 in the ROM, i can add 0x08000000 to that to get 0x08001000, and search for this value. so i'd start from the beginning of the specified "pointer search area", and look for a group of 32bit-word-aligned bytes that look like 00 10 00 08. if i find that, i have potentially found a pointer to the text at 0x00001000. obviously that particular set of bytes could very easily be a false positive, but something like 56 32 1C 08 is much less likely to be a false positive. and of course you can double check it yourself, manually.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Nightcrawler on August 07, 2010, 09:16:30 pm
yes, it can be "random". especially if the pointers are simply embedded in the code in between functions, or if they just dumped a bunch of data somewhere that happens to include pointers as well as other stuff. that's why you need pointer searching/scanning, or brute force dumps. and you do get something tangible from those -- the output atlas script will have #WRITEs for each pointer it found. you can modify those at will to remove false positives and perhaps add in pointers that the routine might have missed. (though it really shouldn't be missing any, unless they fall outside the pointer search area you specify.)

That's not tangible. That's just a guess from an algorithm. The very fact that you have manually go back through them attests to that. If you have hard coded pointers in the assembly code, you can locate them in seconds with a debugger and add to a pointer list. Or if you know the instructions, you can search via instruction pattern. I will support pointer lists. But this is a dumper/inserter, not a scanning or pattern utility for possible pointers. I have no interest in these non tangible methods. I will have only features that result in correct/verifiable/reliable output.

Quote
assuming that screen is showing functionality similar to what is currently in cartographer, it will go through a pointer table and dump the pointers and the text that the pointers point to. that is not the same as what i am saying. what i am saying is for it to go through the text area and find the pointers that point to the text, and dump those pointers and the text.

The results are unreliable guessing based on your searching algorithm.

Quote
? what is a "tangible" scanning algorithm? here is the sort of algorithm i am proposing and have implemented in my (crappy) tools, and i know others have as well:

1. aha i've found the start of a string.
2. let me make a note of what offset this string starts at.
3. let me search a specified area for pointers that point to this offset.
4. ok, i found it. let me dump that pointer along with the string.
5. ok, next string. goto 2.
6. repeat until end of text block is reached.

if you're confused about how one might search for pointers... for example, the GBA addresses the ROM starting at 0x08000000. so, if my string is at 0x00001000 in the ROM, i can add 0x08000000 to that to get 0x08001000, and search for this value. so i'd start from the beginning of the specified "pointer search area", and look for a group of 32bit-word-aligned bytes that look like 00 10 00 08. if i find that, i have potentially found a pointer to the text at 0x00001000. obviously that particular set of bytes could very easily be a false positive, but something like 56 32 1C 08 is much less likely to be a false positive. and of course you can double check it yourself, manually.

There are no tangible scanning algorithms. That was my point. Anything you use will result is something with false positives or misses in the methods you describe. This is not within the scope of the utility I am creating. It fails at step one. In a raw blind dump, you don't know where the start of a string is. You're guessing based on where an end token is or other unreliable means.

Sorry, you'll have to use pointer lists for these or stick to your own tools that do this job so well for you.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Klarth on August 10, 2010, 04:06:33 am
Some thoughts on the table standard:


Other thoughts (not fully related):

I implemented UTF-8 support in Atlas a few days back, so that should be out of the way.  It's not true UTF-8 support, just that the commands, comments, etc have to be ASCII...the actual text portion of Atlas files is considered encoding-neutral because it only needs to be mapped to another binary value, not displayed.

Interest pretty much died in the spreadsheet project, so I put it on the backburner until someone comes up with an XML standard or until early next year.

On the dumper:

It'll be interesting to see what your approach amounts to.  I'm positive it'll be more user-friendly than Atlas (and probably the rest of the inserters/dumpers out there), which is a great thing for people starting out or dealing with simpler games...most people like to play with a gadget out of a box before reading an instruction manual.  I do believe that the approach is going to disappoint when it comes to games more complex than "Here's a relatively neat pointer table and/or here's a nice, neat bank of text.".  But maybe it's been because I've seen too many of Gideon's scripts and too many PSX scripts and know they require game-specific context, which admittedly, Atlas relies on the dumper (or user) for solving.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: DaMarsMan on August 10, 2010, 11:10:33 am
Along these lines, I've always thought about a 'rule' based scanner that can find proper pointers based on the rules you feed it about the game's scripting blocks. It's never 'random' spacing between pointers.

yes, it can be "random". especially if the pointers are simply embedded in the code in between functions, or if they just dumped a bunch of data somewhere that happens to include pointers as well as other stuff. that's why you need pointer searching/scanning, or brute force dumps. and you do get something tangible from those -- the output atlas script will have #WRITEs for each pointer it found. you can modify those at will to remove false positives and perhaps add in pointers that the routine might have missed. (though it really shouldn't be missing any, unless they fall outside the pointer search area you specify.)

Quote
How does that differ from what you can accomplish with the pointer table range here?

assuming that screen is showing functionality similar to what is currently in cartographer, it will go through a pointer table and dump the pointers and the text that the pointers point to. that is not the same as what i am saying. what i am saying is for it to go through the text area and find the pointers that point to the text, and dump those pointers and the text.

Quote
I guess I'm not getting why you keep saying you're in RAW mode, but you're finding pointers somehow. If you're RAW mode, the only thing you can derive is knowing where an string end token is and assuming the next string starts right after that. Even then, you've collected all the information. What do you do with it then to 'find' pointers? I think this circles back to non tangible scanning algorithms.

? what is a "tangible" scanning algorithm? here is the sort of algorithm i am proposing and have implemented in my (crappy) tools, and i know others have as well:

1. aha i've found the start of a string.
2. let me make a note of what offset this string starts at.
3. let me search a specified area for pointers that point to this offset.
4. ok, i found it. let me dump that pointer along with the string.
5. ok, next string. goto 2.
6. repeat until end of text block is reached.

if you're confused about how one might search for pointers... for example, the GBA addresses the ROM starting at 0x08000000. so, if my string is at 0x00001000 in the ROM, i can add 0x08000000 to that to get 0x08001000, and search for this value. so i'd start from the beginning of the specified "pointer search area", and look for a group of 32bit-word-aligned bytes that look like 00 10 00 08. if i find that, i have potentially found a pointer to the text at 0x00001000. obviously that particular set of bytes could very easily be a false positive, but something like 56 32 1C 08 is much less likely to be a false positive. and of course you can double check it yourself, manually.

I've also had this problem where the text chunk is in order but the pointers are out of order. Dumping from the table gives a very confusing script sometimes. Not to mention that you can have duplicate text chunks. It would be a great option to have. 
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Nightcrawler on August 10, 2010, 11:55:46 am
Thanks for the feedback. It would be great to have you on board with it if it will ever see use beyond my personal stuff. The 'official' syntax is '\r' and '\n' The other were typos. Will fix. Thanks. Speaking of this, I don't think we need '\t'. Actual tabs should be able to be used freely in the table entries themselves, so it seems unnecessary.

Some thoughts on the table standard:

  • I agree with the deprecated table entry types such as line break, bookmarks, etc
  • I like the notion of having a table entry that switches tables automatically, but I don't yet understand how many cases your implementation will cover.
  • I don't believe that kanji arrays have a place in a table standard.  I only know of 2-3 games that use this.
  • There are a few spots in the doc where you use /r and /n instead of \r and \n.

I left the table switch part a little open ended because I believe the details of that should be left to the utility to cover the broadest amount of cases. From the table format's perspective, I felt that it only needed to know that there is a table switch, not necessarily the details on how it's used. Standardizing how it's used might lead to standardization of dumpers/inserters? I'm not sure how much the table file itself should dictate on this.

One example I had in there suggested that it will switch to the next table and then stay there until it hits the switch again. However, how long the next table is used before switching back is really undefined because it probably differs game to game.  Some switch for just the next byte. Some switch until the switch is seen again. Some switch until the end of the string. Some switch on other conditions.

Now that I think about it, perhaps a similar syntax to the Kanji Array feature might work where it's specified how long to be in the next table before switching back. 1 character, multiple characters, or infinite until next switch is encountered. Just throwing it out there. The details might still be better left out.

I thought it best only to define a switch is there and leave the implementation details out of the table file.


I generally agree with you on the Kanji array, but hear me out. First, it was considered only because I considered possible compatibility with all previously used utility table features (Romjuice had this one).  I proposed including this because it fits as a feature in the abstraction model we have. We don't have to examine any individual characters or get into game specifics.  It's a feature that aids only in obtaining the correct table value. I thought it could also represent an example of how we might add features to the table format to make it grow and support more while retaining our abstraction level.

Thoughts now that you've heard explanations?

Quote
On the dumper:

It'll be interesting to see what your approach amounts to.  I'm positive it'll be more user-friendly than Atlas (and probably the rest of the inserters/dumpers out there), which is a great thing for people starting out or dealing with simpler games...most people like to play with a gadget out of a box before reading an instruction manual.  I do believe that the approach is going to disappoint when it comes to games more complex than "Here's a relatively neat pointer table and/or here's a nice, neat bank of text.".  But maybe it's been because I've seen too many of Gideon's scripts and too many PSX scripts and know they require game-specific context, which admittedly, Atlas relies on the dumper (or user) for solving.

My development version is already more advanced than every other available dumper I know of.  I'd post more screenshots, but I keep changing the interface and revamping some of my design every other week. It supports things like multi-file dumping/inserting, pascal strings, full fixed length string/line support, and enhanced most features from Cartographer. Atlas compatible output has not been added yet. I'm still deciding how far I will go with that. Different situations will use different appropriate commands. Basic stuff will use autowrite, debug mode will use individual string writes, fixed length and pascal have their own deal, etc.

Disappointment will be true if you expect features with unreliable results such as Kajitani-Eizan would like, or abilities that better belong in a different utility such as pointer list generation. I'm all for expanding its capabilities in a 'smarter'/reliable manner. What I mean by that is some of the possibilities I mentioned a few posts up that might aid in extracting text or parsing more complicated scripting blocks. Anything that produces definable, reliable results.

Native Insertion (other than Atlas compatible output) is shaping up to be done via the same interface. Basically, if you load up the same config file you dumped with (or entered all the same information) the utility should be able to insert anything it dumped. Insertion is still in it's infancy though, but that's the way it's shaping up.

At the end of the day, it's never going to be a be-all-end-all, nor was it ever intended to be. Some people will be disappointed and not talk to me. Some people will bake me cookies. I started it as a project for myself so I didn't need to bother with several variations on custom dumpers for some common situations  that none of the currently available tools did the job right with. I also though it would be a good opportunity to promote some utility compatibility and standardization. It just so happens others are interested in sharing in the results and I think I can give them something better (in some areas) than what we have, if nothing else. :)

With that said, I will certainly welcome any ideas or thoughts you have. I will think about them and consider them. I can't promise I will use them all, and I can't promise you won't be disappointed! ;D
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Nightcrawler on August 10, 2010, 11:59:24 am
I've also had this problem where the text chunk is in order but the pointers are out of order. Dumping from the table gives a very confusing script sometimes. Not to mention that you can have duplicate text chunks. It would be a great option to have.

Maybe I'm just not understanding what he was saying. What you just said there is something I would consider. Re-ordering the whole thing by text location rather than pointer order. Checking for duplicates could be done as well. That stuff can be be done with no scanning at all.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Kajitani-Eizan on August 10, 2010, 02:43:56 pm
That's not tangible. That's just a guess from an algorithm. The very fact that you have manually go back through them attests to that.

i think i see what you're saying, but i cannot remotely agree with it. you're saying that you wish for your program output to be "perfect". but that requires that the user put excessive effort into making sure the program input is "perfect". for example, you want to be sure that every pointer your program attempts to dump is in fact a pointer to valid text, but this requires that the user spend excessive effort either building many many tiny pointer tables, or building a pointer list, or writing their own utility that has basically the same functionality as your program but supports this sort of pointer searching to build a pointer list.

i don't see what the point of that is when the point is to make a program that makes things easier for the user. your program gains "perfection" (which i don't think is important to begin with), but at the cost of the whole point of doing it in the first place. romhacking by its nature is ugly; it was never supposed to be a process that's 100% "perfect" 100% of the way.

Quote
If you have hard coded pointers in the assembly code, you can locate them in seconds with a debugger and add to a pointer list. Or if you know the instructions, you can search via instruction pattern.

the former is not practical when dealing with large numbers of pointers/strings. the latter is not reliable in the general case at all... isn't your issue with brute force and pointer searching one of reliability?

Quote
But this is a dumper/inserter, not a scanning or pattern utility for possible pointers.

those go hand in hand. as stated above, a scanning utility for pointers will do much the same stuff a dumper would do... it seems odd to separate the two into separate programs.

Quote
In a raw blind dump, you don't know where the start of a string is. You're guessing based on where an end token is or other unreliable means.

often you'll have nothing between strings other than nulls. in that case, that makes start-of-string detection pretty reliable. 100% reliable, in fact. and if you don't, well... the raw blind dump itself would not be "perfect" then, as the user would have to manually go back and filter out the chaff from the wheat. that is to say, the raw blind dump (your program output) would not be 'tangible'.

and in fact, if the only other bytes embedded between strings are bytes with small values (out of range of the table) or pointers (which you can sense), it's still 100% reliable even with data in between strings.

Quote
or stick to your own tools that do this job so well for you

my, no need to be snippy

Maybe I'm just not understanding what he was saying. What you just said there is something I would consider. Re-ordering the whole thing by text location rather than pointer order. Checking for duplicates could be done as well. That stuff can be be done with no scanning at all.

what i was saying does not quite equate to this, though it can be similar. but yeah, this would be a cool feature.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Nightcrawler on August 10, 2010, 03:23:23 pm
We're getting closer to understanding one another. There's hope, but still a gap remains!  ;D

i think i see what you're saying, but i cannot remotely agree with it. you're saying that you wish for your program output to be "perfect". but that requires that the user put excessive effort into making sure the program input is "perfect". for example, you want to be sure that every pointer your program attempts to dump is in fact a pointer to valid text, but this requires that the user spend excessive effort either building many many tiny pointer tables, or building a pointer list, or writing their own utility that has basically the same functionality as your program but supports this sort of pointer searching to build a pointer list.

That's not really what I'm trying to say. You want to work from unknown, I want to work from known. That's sort of the difference. It's probably best to discuss some specific cases to illustrate.

Quote
Ithe former is not practical when dealing with large numbers of pointers/strings. the latter is not reliable in the general case at all... isn't your issue with brute force and pointer searching one of reliability?

You caught me with the instruction pattern reliability. Indeed it is not. However, when the assembly code loads the pointer, catching it in the debugger is probably the only thing remotely practical. Let's say your pointer is called like this:

lda #$cb
ldx #$b654
jsr $4567

This is just one scenario on one platform. How are you going to scan and find any pointers in that fashion? With the opcodes separating it, it's not going to turn up in any pointer search you can possibly do unless you had wild cards to account for the unknown opcode. Snapping the debugger would catch them immediately in all cases with full certainty. Create a list, and you're done.

Quote
those go hand in hand. as stated above, a scanning utility for pointers will do much the same stuff a dumper would do... it seems odd to separate the two into separate programs.

That depends on the scan. A simple scan based on end of strings and a pointer format is reasonably related. But if you're going to blow it up into something smarter such as being able to parse scripting blocks via commands, work with instruction patterns etc., it becomes it's own utility in my opinion. It's much more than a side feature to add to a dumper.

Quote
often you'll have nothing between strings other than nulls. in that case, that makes start-of-string detection pretty reliable. 100% reliable, in fact. and if you don't, well... the raw blind dump itself would not be "perfect" then, as the user would have to manually go back and filter out the chaff from the wheat. that is to say, the raw blind dump (your program output) would not be 'tangible'.

and in fact, if the only other bytes embedded between strings are bytes with small values (out of range of the table) or pointers (which you can sense), it's still 100% reliable even with data in between strings.

This is the most trivial of cases where the text would be so nice as to flow completely together after an end string. I'll give you that case. Any other case and your searching in vein because you don't know where the text really starts. It seems like a better idea to me to work with what you know so a utility can intelligently find text or parse blocks. Maybe an example case would better clear the issue between us. Can you make up an example?

To be useful the the raw blind dump mode is really for those one off times when you want to dump something without need of any pointers. Maybe a few one-off strings or menu items, or a blob of text or something you want to take a peek at in remotely more readable format.. You're right if you try to dump large chunks with it, it's probably not doing anything reliable or useful for that matter.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: DaMarsMan on August 10, 2010, 03:51:40 pm
You caught me with the instruction pattern reliability. Indeed it is not. However, when the assembly code loads the pointer, catching it in the debugger is probably the only thing remotely practical. Let's say your pointer is called like this:

lda #$cb
ldx #$b654
jsr $4567

This is just one scenario on one platform. How are you going to scan and find any pointers in that fashion? With the opcodes separating it, it's not going to turn up in any pointer search you can possibly do unless you had wild cards to account for the unknown opcode. Snapping the debugger would catch them immediately in all cases with full certainty. Create a list, and you're done.

In my opinion, situations like this are best handled by an emulation core. I've seen examples in byuu's source of doing things like this. You can set the initial CPU registers and then loop through a table with given values. Doing this, you could set a breakpoint at the beginning of the $4567 function and set back an instruction to dump the offset of the ldx. I'm not sure how possible that is.

I coded a utility called pointer grabber that's in the utilities section of this site. It's not perfect and needs to be customized but what it does is scan for pattern like "JSR $4567" and then dumps the 2 bytes preceding it. Obviously you get false positives with it but you can usually weed them out pretty easily. Lots of times games seem to have all these functions in the same clump so if you disassemble that bank, you'll be able to see them all right next to each other and go through by hand. It's painful, but may be a more accurate approach.


So, although that should probably be externally handled, here is at least what I'd prefer:

Option for dumping based on string offset order:
At the beginning of each string dump, store the offset into a table for lookup. Then, if the option is selected, you can organize and output by that later.

Option to remove duplicate strings:
If the offsets were stored in a table as stated above, you could simply group all pointers that pointed to the same offset. This way you can get a much cleaner script. I can see situations where you would want to NOT remove duplicates but I think it would be an awesome feature.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Kajitani-Eizan on August 10, 2010, 05:00:35 pm
Let's say your pointer is called like this:

lda #$cb
ldx #$b654
jsr $4567

This is just one scenario on one platform. How are you going to scan and find any pointers in that fashion? With the opcodes separating it, it's not going to turn up in any pointer search you can possibly do unless you had wild cards to account for the unknown opcode. Snapping the debugger would catch them immediately in all cases with full certainty. Create a list, and you're done.

i am not sure what is going on there. is it 6502 asm? unfortunately i do not have experience with that. where is the pointer? regardless, unless there are pointer types i am grossly misunderstanding, if you can write it using #WRITE, you can search for it. for example, if b654 is the pointer in the example above, i don't see why you can't search for b654. you might end up getting more false positives than you'd like, since this isn't a four-byte pointer like i usually would work with and so there's less specificity, but you ought to be able to find it.

and if you would like a rough example of how data can be interspersed wherever (making my method useful):

Code: (ARM assembly) [Select]
...

ldr r0,=String1Pointer
ldr r0,[r0]
ldr r1,=SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1
ldr r2,=SomeOtherCrapDataAssociatedWithTheMenuOrWhatever2

...

pop r4-r7,r15   ; end of function

push r4-r7,r14 ; start of next function

...

ldr r0,=String2Pointer
ldr r0,[r0]

...

pop r4-r7,r15   ; end of function

push r4-r7,r14 ; start of next function

...

ldr r0,=String3Pointer
ldr r0,[r0]
ldr r1,=SomeOtherCrapDataAssociatedWithTheMenuOrWhatever3
ldr r1,[r1]

...

pop r4-r7,r15   ; end of function

; Data area

<String1Pointer>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever2>
<String2Pointer>
<String3Pointer>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever3>

probably a crappy example, but the point is that the data area will not have a set pattern you can exploit. you either brute force or pointer search it (the easy ways to do it) or you spend forever building a pointer list. (in this example with only 3 pointers, it's not bad, but if there are a lot then it becomes moderately to excessively tedious.) alternatively, instead of the data area being at the end of all the functions, there could be a small data area at the end of each function:

Code: [Select]
(Start Function 1)
...
(End Function 1)
<String1Pointer>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever2>

(Start Function 2)
...
(End Function 2)
<String2Pointer>

(Start Function 3)
...
(End Function 3)
<String3Pointer>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever3>

in this case, you either pointer search it (the easy way) or you spend forever building a pointer list.

Quote
That depends on the scan. A simple scan based on end of strings and a pointer format is reasonably related. But if you're going to blow it up into something smarter such as being able to parse scripting blocks via commands, work with instruction patterns etc., it becomes it's own utility in my opinion. It's much more than a side feature to add to a dumper.

yeah, i was thinking of how cartographer already lets you specify how the pointers are set up. so the pointer format information is already being input into the program. all you'd have to do is generate pointers, and then search for those values, instead of just reading pointers.

not sure what you mean on that last part. what commands and instruction patterns are you talking about? i thought we're just talking about dumping text (and its associated pointers) from a game.

Quote
This is the most trivial of cases where the text would be so nice as to flow completely together after an end string. I'll give you that case. Any other case and your searching in vein because you don't know where the text really starts. It seems like a better idea to me to work with what you know so a utility can intelligently find text or parse blocks. Maybe an example case would better clear the issue between us. Can you make up an example?

isn't that trivial case a common case? obviously if it's just way too complicated then this method wouldn't be usable for a general-purpose script ripper, e.g.:

Code: (rather tough to tell where strings start) [Select]
56 23 1C 08  (  S  W  D  )  L  o  n  g     S  w  o  r  d 00 84 23 1C 08  (  S  W  D  )  B  r  o  a  d     S  w  o  r  d 00

^-- a pointer to something
             ^-- some kind of identifier, maybe something points here too
                            ^-- the main string; there is a pointer to this somewhere

but if your script looks like this:

Code: (easy to tell where strings start) [Select]
L  o  n  g     S  w  o  r  d 00 00 00 00 00  B  r  o  a  d     S  w  o  r  d 00
^-- pointer pointing here
                                             ^-- pointer pointing here

then it's super easy to figure out where a string starts. once you know where it starts, and the user inputs information on the pointer format (e.g. endianness, bytewidth, offset, etc.), you know what the pointer pointing to it must look like. once you know that, you can search for it.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Nightcrawler on August 10, 2010, 06:00:35 pm
i am not sure what is going on there. is it 6502 asm? unfortunately i do not have experience with that. where is the pointer? regardless, unless there are pointer types i am grossly misunderstanding, if you can write it using #WRITE, you can search for it. for example, if b654 is the pointer in the example above, i don't see why you can't search for b654. you might end up getting more false positives than you'd like, since this isn't a four-byte pointer like i usually would work with and so there's less specificity, but you ought to be able to find it.

65816. No, the pointer is $cbb654. The pointer load is separated across multiple opcodes. The pointer bytes are separated by opcodes. Along these lines often times the bank portion of the pointer is stored completely separate from the lower word. You can't write it using a single #WRITE. Understand now? Things like this are very common on my platform of expertise in which the methods you've described aren't useful. So now let's talk about yours.

Quote
and if you would like a rough example of how data can be interspersed wherever (making my method useful):

Code: (ARM assembly) [Select]
...
<String1Pointer>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever2>
<String2Pointer>
<String3Pointer>
<SomeOtherCrapDataAssociatedWithTheMenuOrWhatever3>

probably a crappy example, but the point is that the data area will not have a set pattern you can exploit. you either brute force or pointer search it (the easy ways to do it) or you spend forever building a pointer list. (in this example with only 3 pointers, it's not bad, but if there are a lot then it becomes moderately to excessively tedious.)

I understand how the pointer can be found here in the assembly code if you had a matching address. You have your full 4-byte pointer right there non-separated.  That's different from the case I described above. Easier. However, I do have further questions. Where do you start dumping? The assembly code? The data area? I'm assuming it's the data area. If it's the data area, where do you scan for a pointer match? The entire ROM?  You start finding strings in the data area. You get to the end of <String1Pointer>'s end terminator. Now you now mark the start of <SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>. Now what? your search fails. So you continue on searching for every location until you hit a match again at <String2Pointer>? If I understand this correctly, this is prone to horrible error. Your pointer match could find the correct instruction. It could find <SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1> that happened to have the same bytes. (4-bytes is probably unlikely but possible.) There could be two matches, one is the real, one is a fake.

And as I mentioned before, none of this even remotely works if the pointer isn't all together nice and neat.

Quote
isn't that trivial case a common case? obviously if it's just way too complicated then this method wouldn't be usable for a general-purpose script ripper, e.g.:

Not that common on my platforms of expertise. I find quite a bunch of separated pointers and short relative pointers. I suppose it's more common on GBA and later systems that always have neat absolute 32-byte values together like that. That's really the only case I can see this working for.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Kajitani-Eizan on August 10, 2010, 07:11:32 pm
65816. No, the pointer is $cbb654. The pointer load is separated across multiple opcodes. The pointer bytes are separated by opcodes. Along these lines often times the bank portion of the pointer is stored completely separate from the lower word. You can't write it using a single #WRITE. Understand now? Things like this are very common on my platform of expertise in which the methods you've described aren't useful.

? okay, so then how do you define the pointer in cartographer? i mean, can you use cartographer at all to rip from this pointer? if not, then this is outside the realm of our discussion, as the only way to handle it is to dump RAW and handle the repointing manually; you can't use pointer lists or whatever else to do it either. if cartographer can handle it, why wouldn't you be able use the same pointer definition to search for the pointer?

Quote
The data area? I'm assuming it's the data area. If it's the data area, where do you scan for a pointer match? The entire ROM?

do you mean for the "brute force" method or the "pointer search" method? for the former, you are already in the pointer area; there is nothing to scan. you would follow each potential pointer and dump the potential string it points to. for the latter, the data area itself is the area you want to scan. the dumping area is where the text is.

i'm not sure i understand the rest of your analysis... i think my example may not have been clear enough. that, or the ASM is confusing. quick clarification:

in the function:
ldr r0,=String1Pointer  <-- load r0 with the address of String1Pointer.
ldr r0,[r0] <-- load the data at r0 into r0. (so, load the absolute pointer to the string into r0)

in the data area:
<String1Pointer> - four bytes. absolute pointer to the string.

in the text/dumping area:
S  t  r  i  n  g  1 00    <-- the string being pointed to by String1Pointer.
S  t  r  i  n  g  2 00
S  t  r  i  n  g  3 00 , etc.

for the "brute force" method, you iterate through the data area and follow any pointers you find there to their strings, and dump the pointers and strings. it's basically exactly what cartographer does right now with dumping from a pointer table, except it shouldn't crash when it runs into an invalid pointer (e.g. <SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>); it just notes an error and continues to execute. for the "pointer search" method, you iterate through the strings in the text/dumping area and search in a specified area for pointers that point to these strings. the specified area should encompass the data area(s). then you can dump any pointers you find this way along with the strings. (i initially drew a comparison between this and RAW dump mode because you are dumping from the text, not the pointers.)

i hope the above clears it up. if not, please let me know.

Quote
And as I mentioned before, none of this even remotely works if the pointer isn't all together nice and neat.

a) i'm still not totally convinced of this (see above), and b) i don't think it's relevant even if doesn't work in that case. you would only use the option in cases where you want it/it works, right? unless you're saying that it would be useful only in rare cases, and thus not worth including... which i assume may be true of systems in your area of expertise, but is not true for other systems (e.g. GBA/NDS).

Quote
Not that common on my platforms of expertise. I find quite a bunch of separated pointers and short relative pointers. I suppose it's more common on GBA and later systems that always have neat absolute 32-byte values together like that. That's really the only case I can see this working for.

you can search for relative pointers as well. by separated pointers, do you mean where the load is split across two instructions or where the pointers are really far apart? if the former, see above. if the latter, pointer search excels in that very instance.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Nightcrawler on August 11, 2010, 08:57:24 am
? okay, so then how do you define the pointer in cartographer? i mean, can you use cartographer at all to rip from this pointer? if not, then this is outside the realm of our discussion, as the only way to handle it is to dump RAW and handle the repointing manually; you can't use pointer lists or whatever else to do it either. if cartographer can handle it, why wouldn't you be able use the same pointer definition to search for the pointer?

You can't define or dump it in Cartographer. You CAN add the pointer to a pointer list and dump that way. Writing the pointer back is more difficult, but Atlas can do it with some of it's more advanced commands separating the write of upper and lower bytes of the pointer to different spots.

This was to exemplify a case off the top of my head your method wouldn't work for. Your method has limited application to absolute pointers where all bytes that make up the pointer are stored together. I'd prefer something with a more widespread application.
 
Quote
i'm not sure i understand the rest of your analysis... i think my example may not have been clear enough. that, or the ASM is confusing. quick clarification:

in the function:
ldr r0,=String1Pointer  <-- load r0 with the address of String1Pointer.
ldr r0,[r0] <-- load the data at r0 into r0. (so, load the absolute pointer to the string into r0)

in the data area:
<String1Pointer> - four bytes. absolute pointer to the string.

in the text/dumping area:
S  t  r  i  n  g  1 00    <-- the string being pointed to by String1Pointer.
S  t  r  i  n  g  2 00
S  t  r  i  n  g  3 00 , etc.

for the "brute force" method, you iterate through the data area and follow any pointers you find there to their strings, and dump the pointers and strings. it's basically exactly what cartographer does right now with dumping from a pointer table, except it shouldn't crash when it runs into an invalid pointer (e.g. <SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>); it just notes an error and continues to execute. for the "pointer search" method, you iterate through the strings in the text/dumping area and search in a specified area for pointers that point to these strings. the specified area should encompass the data area(s). then you can dump any pointers you find this way along with the strings. (i initially drew a comparison between this and RAW dump mode because you are dumping from the text, not the pointers.)

i hope the above clears it up. if not, please let me know.

I think I am clear, but I still think it is extremely limited in application. Let's talk 'brute force'.

1. Ok, you start in the data area. You dump what's pointed to by 4-byte <String1Pointer>. Move to the next 'pointer' which is <SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>. The program won't  crash on an 'invalid' pointer. However, what if this data is 1 byte long then the real pointers continue after that. You were dumping by 4-byte long pointers. The entire thing is thrown completely off on the first invalid pointer onwards unless the pointers and data are 4-byte aligned.

Now 'pointer search' method.

2. I see a little more merit in this method. However, this goes back to the first paragraph where it's only going to work where the pointers are absolute and bytes stored together. I believe that to be a very limited application overall. Maybe it happens more often with GBA/NDS, but I don't see that as often on other platforms.

Quote
a) i'm still not totally convinced of this (see above), and b) i don't think it's relevant even if doesn't work in that case. you would only use the option in cases where you want it/it works, right? unless you're saying that it would be useful only in rare cases, and thus not worth including... which i assume may be true of systems in your area of expertise, but is not true for other systems (e.g. GBA/NDS).

Yes, that's what I'm saying. It's useful only in specific cases that aren't too common on some platforms, but more common on others. I'd rather work on something with more wide spread, flexible application.

Quote
you can search for relative pointers as well. by separated pointers, do you mean where the load is split across two instructions or where the pointers are really far apart? if the former, see above. if the latter, pointer search excels in that very instance.

You can't search for relative pointers when you don't know what they're relative to. Games I've seen that use this have a base value they use and calculate the actual pointer by manipulating a relative pointer and adding it to the base. What is the base? You wouldn't know without looking through the code. It may be as simple as the start of the pointers or text, it may be more complicated than that. What is the manipulation? It may be a shift operation, it may be more.

I meant the load AND the pointer bytes themselves are split. the load is across multiple instructions and the pointer bytes themselves are stored separately and combined by the instructions.

To reiterate, I see a very limited and specific application to your method. I'd much rather see a more program intelligent way to address these situations that would handle your case and some of the ones I have brought up. Off the top of my head, a pointer list generating program with some ability for patterns, wild cards, equations, and searching might be what the doctor ordered.  Think along the lines of relative searching programs, but applied to finding and handling complex mixes of pointers and strings. That's one way I envision most of this cases being able to be handled.

Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Klarth on August 11, 2010, 11:28:30 am
I think the community overall could benefit from a quality pointer scanning / pointer table finding feature.  I'll give my view and experience regarding this.

Nightcrawler's sample is very much the same as what Cless wrote a program for PS-EXE text dumping (many PS-EXE strings use hard coded pointers, which are split between multiple instructions).  It was basically an instruction interpreter and output an Atlas script with separate high and low word writes for 32 bit pointers.  I agree with Nightcrawler that this is an example of something that should be its own utility rather than a niche dumper feature.

However, I do believe a generic pointer scan feature should be included with a good dumper, even with the limitations discussed.  There are two basic steps in script dumping: script interpretation and character encoding.  Character encoding is the obvious step, but people tend to forget about the script interpretation part, which is the weak part of currently available generic dumpers.  Script interpretation involves various aspects like detecting end string values (to print out debug info in the script or for formatting), ignoring padding bytes for boundary aligned strings, or interpreting other various control codes necessary for dumping.  A pointer scanning utility definitely needs to be able to detect the address that the string begins at in order to validate a pointer as being meaningful.  At this point, such utility is a table library shy of becoming a text dumper.

The only issue is how feature filled a generic pointer scanner should be.  I think a simple formula (or just adding/subtracting a constants) would work well for most cases where pointers are not split.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Kajitani-Eizan on August 11, 2010, 11:38:56 am
ah, i wrote this up before klarth's post... but i think most of it should still be fresh. apologies if i'm wrong about this... but even the PS-EXE case can be handled and be made much more convenient using pointer search, if there are a lot of pointers that use the same high word. you'd still have to do some shenanigans for when pointers change high words, and then hand-modify the atlas scripts to include the high word/low word thing, but it would still be useful to use pointer search. obviously a custom tool would be preferable, though.

You can't define or dump it in Cartographer. You CAN add the pointer to a pointer list and dump that way.

that doesn't make any sense. if you can put it in a pointer list, you can search for it. (or did you simply mean that cartographer doesn't currently support it, but your new program will?)

Quote
Your method has limited application to absolute pointers where all bytes that make up the pointer are stored together. I'd prefer something with a more widespread application.

???? this is not limited at all. this is a very, incredibly, widespread case. in fact i would be willing to bet money that this is a more common case than when you have multiple instructions being used together in such a way as to make this method excessively inconvenient (meaning, you might as well just use single pointer dumps) or impossible (would bet money that this is would be a laughably fringe case). at the very minimum, it is very much applicable to GBA and NDS hacking, which are fairly popular systems to romhack.

(and no, as i said before, it does not have to be absolute pointers -- relative pointers are fine too. though absolute is preferred, for increased specificity due to lack of starting at 0.)
 
Quote
I think I am clear, but I still think it is extremely limited in application. Let's talk 'brute force'.

1. Ok, you start in the data area. You dump what's pointed to by 4-byte <String1Pointer>. Move to the next 'pointer' which is <SomeOtherCrapDataAssociatedWithTheMenuOrWhatever1>. The program won't  crash on an 'invalid' pointer. However, what if this data is 1 byte long then the real pointers continue after that. You were dumping by 4-byte long pointers. The entire thing is thrown completely off on the first invalid pointer onwards unless the pointers and data are 4-byte aligned.

at least on GBA/NDS, it makes no sense to do this, as it is an incredible amount of extra work that no one would go through the effort to do, seeing as how there is no benefit whatsoever. pointers should always be word-aligned or maybe half-word aligned (for 2-byte relative pointers) depending on how the code is set up. is that sort of thing commonplace in 6502 or 65816? even so, you can still brute force it by checking for a pointer at every location instead of limiting yourself to being word-aligned.

Quote
Now 'pointer search' method.

2. I see a little more merit in this method. However, this goes back to the first paragraph where it's only going to work where the pointers are absolute

just to reiterate, no.

Quote
and bytes stored together

again just to reiterate, this is a very frequent occurrence. certainly frequent enough to merit inclusion.

Quote
but I don't see that as often on other platforms.

which other platforms? even without having any experience with anything but PC programming and GBA/NDS ASM i would again be willing to bet money that just about any 32-bit platform would find my method to be applicable, with varying degrees of specificity and convenience improvement vs. the alternatives. do you have real experience with many different platforms to be able to make this kind of statement...

Quote
Yes, that's what I'm saying. It's useful only in specific cases that aren't too common on some platforms, but more common on others. I'd rather work on something with more wide spread, flexible application.

... and more importantly, this kind of judgment? if not, what you're essentially saying is "i haven't seen much use for this on NES/SNES, so it's not worth including". which i find curious considering NES/SNES is not the only systems people are romhacking, and in fact there is ongoing discussion right now (http://www.romhacking.net/forum/index.php/topic,11242.0.html) about how there should be more documentation for hacking newer systems.

Quote
You can't search for relative pointers when you don't know what they're relative to. Games I've seen that use this have a base value they use and calculate the actual pointer by manipulating a relative pointer and adding it to the base. What is the base?

this is irrelevant. you would need to know the base to use any method at all other than RAW dumping.

Quote
What is the manipulation? It may be a shift operation, it may be more.

as long as the least significant part of the pointer is simply added to the result (which is necessarily true if the string being pointed to can start on any byte/doesn't need to be word or halfword aligned, which i suspect forms the majority of cases), this is irrelevant.

Quote
I meant the load AND the pointer bytes themselves are split. the load is across multiple instructions and the pointer bytes themselves are stored separately and combined by the instructions.

once again irrelevant, in the example you gave, because it was basically a relative pointer with offset $cb0000. if each pointer uses a different initial instruction, then yes, it would not be convenient enough to be worth using (and coincidentally enough (not), as i pointed out above, pointer lists would also be useless, unless you're keeping track of that initial instruction for each individual entry... in which case it's not really a pointer list anymore). if a lot of pointers use $cb, however, it would be very useful.

Quote
I'd much rather see a more program intelligent way to address these situations that would handle your case and some of the ones I have brought up. Off the top of my head, a pointer list generating program with some ability for patterns, wild cards, equations, and searching might be what the doctor ordered.

implementing regex or some kind of complicated parser would be very powerful indeed. however, it would be probably be a pain to do -- both on your end and the user end. much simpler, both for yourself and the user, would be to simply include pointer search and brute force. (not that you have to actually DO anything to support brute force, other than not have it crash or halt execution when it hits an invalid pointer :P)
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Nightcrawler on August 11, 2010, 02:05:44 pm
Klarth:
I caught your post squeezed in there. I can see your logic. We'll see how it goes. No guarantees that my utility will serve these purposes.

Kajitani-Eizan:

that doesn't make any sense. if you can put it in a pointer list, you can search for it. (or did you simply mean that cartographer doesn't currently support it, but your new program will?)

You, as a human, physically add it to the list file. The list is loaded by the dumper and dumps them. This is how pointers that no program can currently detect can still easily be dumped. That doesn't make sense?

Quote
???? this is not limited at all. this is a very, incredibly, widespread case. in fact i would be willing to bet money that this is a more common case than when you have multiple instructions being used together in such a way as to make this method excessively inconvenient (meaning, you might as well just use single pointer dumps) or impossible (would bet money that this is would be a laughably fringe case). at the very minimum, it is very much applicable to GBA and NDS hacking, which are fairly popular systems to romhack.

You believe your method is not limited at all? I'm not going to argue with you. Just as sure as it is common on 32-bit platforms is just as sure as it's not so common on 8-bit and/or 16-bit platforms. It can't be by their own register vs. pointer size mis-matches. An 8-bit register can't hold a 16-bit pointer for example. The reason you see it so frequently on GBA is due to the way the ARM series (and many modern) processors work vs. pointer size. NDS is immaterial to me as I do not support hacking current generation systems. I prefer a solution applicable to old and new which ideas were already discussed. I am of course also biased to adding features that directly benefit my own projects since that's the primary reason for the tool's existence in the first place.

Quote
which other platforms? even without having any experience with anything but PC programming and GBA/NDS ASM i would again be willing to bet money that just about any 32-bit platform would find my method to be applicable, with varying degrees of specificity and convenience improvement vs. the alternatives. do you have real experience with many different platforms to be able to make this kind of statement...

Is this the e-peen battle question? :P Over the past 13 years I have done at least some degree of hacking on GB, SMS, GENESIS, NES, SNES, TG16, GBA and PC. I also know exactly what you're saying about 32-bit ARM CPUs and how they function with byte alignment. I am an electrical engineer and work with firmware and hardware design regularly. </end e-peen>

You want to discount all other platforms as less relevant and you're upset I don't hold enough weight for GBA, PC, and current generation systems.  Nothing we can do about that but respect one another's opinion on the matter. You've made your point. The feedback is appreciated. I will take it under advisement. :)
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Kajitani-Eizan on August 11, 2010, 09:52:48 pm
Quote
You, as a human, physically add it to the list file. The list is loaded by the dumper and dumps them. This is how pointers that no program can currently detect can still easily be dumped. That doesn't make sense?

no, it doesn't. if you, as a human, can physically add it to the list file, a program can search for it and find it (probably). i think i have said this several times now, so i should probably do something different: can you explain how a human can compile a list of pointers that cannot be found by a program? what is the impediment to the program finding them? exactly what is the complicated analysis operation that the human is doing that the program cannot? what extra information does this "pointer list" that a human would compile contain besides the "main" part of the pointer itself, and why, if this extra information were provided to the program, would the program not be able to find the pointers in question?

Quote
The reason you see it so frequently on GBA is due to the way the ARM series (and many modern) processors work vs. pointer size.

okay, now we're getting somewhere. so you admit that it is something that is seen "frequently" and thus not "very limited" in scope.

Quote
NDS is immaterial to me as I do not support hacking current generation systems. I prefer a solution applicable to old and new which ideas were already discussed. I am of course also biased to adding features that directly benefit my own projects since that's the primary reason for the tool's existence in the first place.

oh. well, if you would just say this in the first place, it would lead to a lot less confusion. what you want to work on is up to you. however, when you claim something is "not tangible" and "limited" in scope when that's simply not true, it leads to lots of unnecessary discussion :P

i apologize if it seemed i was trying to initiate e-penis comparisons... i was just very confused as to how you could make sweeping statements about how limited in scope this method would be, when it is not. now that you have clarified that you are focusing only on older systems, essentially disqualifying most of the wide variety of systems that i was assuming would find this method useful, it makes more sense. however...

You want to discount all other platforms as less relevant and you're upset I don't hold enough weight for GBA, PC, and current generation systems.  Nothing we can do about that but respect one another's opinion on the matter. You've made your point. The feedback is appreciated. I will take it under advisement. :)

um, no. i am not discounting other platforms as less relevant. that is what you are doing, ironically. i am proposing a method that i think is very useful for newer platforms. including it would not harm support for older platforms in any way whatsoever; it would just make things easier for those who happen to be hacking those newer platforms. i'm not sure why a feature should be excluded solely on the basis of it not necessarily being useful for the older generation, nor on the basis that it is not "tangible" or "reliable". when you hack a game, not everything you do will be "perfect" in the way you seem to be insistent on. if you use slightly "imperfect" methods to achieve a clean and properly working end result, i don't see what the problem is. put another way, i don't see the value in pursuing such an... intangible... ideal.

anyway, i suppose i have said my part... whether you want to include it or not is up to you :P
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Tauwasser on August 12, 2010, 06:49:30 am
at least on GBA/NDS, it makes no sense to do this, as it is an incredible amount of extra work that no one would go through the effort to do, seeing as how there is no benefit whatsoever. pointers should always be word-aligned or maybe half-word aligned (for 2-byte relative pointers) depending on how the code is set up. is that sort of thing commonplace in 6502 or 65816? even so, you can still brute force it by checking for a pointer at every location instead of limiting yourself to being word-aligned.

While this is certainly true for a majority of games, some games do go the extra mile and load pointers byte-wise. Just look at the PokéMon franchise. I also saw it in some other Japanese ROMs, but alas, I forgot the name of that game. However, I think all pointers used by ASM routines will be aligned, since no compiler will split those up, I hope. However, for some pointer in some data, this doesn't hold.

cYa,

Tauwasser
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Kajitani-Eizan on August 12, 2010, 07:50:27 am
LOL

to be fair, i myself coded a routine that loaded pointers using half-words. but this was to allow you to stick pointers directly into/on top of double byte text strings, i.e. it was a rom hack. i'm pretty confused as to why a compiled, non-hacked game would need to do this, because (a) the only reason i can think of is if they too stuck pointers into text strings and (b) there's no good reason for them to do (a), i think.
Title: Re: Translations: Table File Standard, Generic Dumpers, and more!
Post by: Tauwasser on August 12, 2010, 08:02:51 am
Well, it's for in-game scripting routines. Possibly because they wanted to save space by using variable-width commands instead of words. So one command can be 6 bytes long, the next only 3 etc. Therefore pointers won't be aligned and need to be read byte-wise.

cYa,

Tauwasser