1
Programming / Re: Table File Standard Discussion
« on: August 27, 2011, 04:51:35 pm »
Hi,
Sorry for my long absence, but real life grabbed me by the neck... So anyway, I re-read the thread from start to finish again and I'll share my thoughts with you:
Table matches with hex literals
I think we should count hex literals as matches for the current table in the event the table produced (or would have produced) the literal: That is to say,
This behavior mirrors dumping. It can produce cases that are not correct for the game's display mechanism, but since this case is ambiguous, either choice can. This would extend to table matches other than infinite and one match(es).
Restricting table matches to infinite or one match(es)
I think this is a good restriction. Even all computer text encodings ever produced were always flat within their code space. However, the case, we should think about the following:
The problems we are facing are inherent to implementations that knew the restrictions on their input. We don't necessarily have that luxury. Notice that the last point also indicates that I think the kanji array problem is solvable, however, possibly not in linear or quadratic time. I'm not so much concerned about memory, as we happen to have a lot in machines built after 2005.
I'm in favor or restricting the current release of the standard and extend the syntax upon finding a suitable solution to a suitable problem that uses multiple matches within one table other than unlimited matches or different match ranges in the same table B reachable from some table A. So I would be definitely in favor of keeping the current tablename,№Matches syntax.
Direct table fallback
I'm in favor of direct table fallback using the !7F=,-1 syntax proposed. Not having it might cause a theoretically infinite stack of tables for an infinite given string. It's a pretty big oversight and I'm happy that somebody caught it.
Algorithm selection
The current draft does not contain anything pertaining to that. So I have to ask again: We are in favor of multiple allowed insertion algorithms along with compliance levels, right? With compliance levels I mean that any implementation of the standard must indicate for instance "TFS Level 1 compliance" for implementing "longest prefix insertion" and "TFS Level 2 compliance" for implementing "longest prefix insertion and A* optimal insertion" as well as not making up new compliance levels which might be added in a revision of the TFS.
Control Codes
First of all, yes, linked entries should be renamed to make their purpose clearer.
Also, getting a "insertion test string" for grouping input arguments is simple using regular expressions (or some form of linear substitution method of your choice). For that matter, it seems we should only allow one-byte arguments to avoid Endianess issues and we can simplify the arguments to %D, %X, %B for decimal, hexadecimal and binary respectively.
<window x=%X y=%D> can be easily turned into a match string using the following substitutions (as well as substitutions to mark user-supplied {}[] etc. as literals):
Notice that we do need to keep identifiers in the output, contrary to what was suggested. If we do not do that, we are open to the following exploits:
How do you parse <window x=10 y=10> without additional context information? You can't. Also, we might want to include signed decimal and signed hex.
Variable parameter list
First of all, I must admit that I have never ever seen a game use these. Then again, it's not impossible. The old syntax would have lent itself rather well to defining these:
The argument length would construe a sensible maximum argument list limit and the optional parameter after the second comma would indicate the list end identifier. Having said this, I'm not strongly in favor nor strongly opposed to include variable argument lists in any form or fashion.
Regular expression table entries
I still think they are useful, however a burden to implement and make safe with the current control code set. Their main purpose were commenting and that has been pushed to top-level utilities. For normal entries nothing is gained by this anymore. For control codes and table options, it might become a safety issue with blindly using user-controlled regex content upon output...
Multiple tables per file
Oppose because of the added complexity with almost not gain.
I hope I did not miss anything that was supposed to be addressed. Also, please mark this thread as new so other regulars get an email notification about changes.
cYa,
Tauwasser
Sorry for my long absence, but real life grabbed me by the neck... So anyway, I re-read the thread from start to finish again and I'll share my thoughts with you:
- I'm okay with comments being gone.
- I would like to see newlines go, too.
- End tokens can move to the utility realm.
Table matches with hex literals
I think we should count hex literals as matches for the current table in the event the table produced (or would have produced) the literal: That is to say,
- an unlimited table would have counted an unknown byte as a fallback case and fallen back to the table below it,
- If the stack is empty, or the table is not unlimited, the hex literal counts towards it in insertion direction.
This behavior mirrors dumping. It can produce cases that are not correct for the game's display mechanism, but since this case is ambiguous, either choice can. This would extend to table matches other than infinite and one match(es).
Restricting table matches to infinite or one match(es)
I think this is a good restriction. Even all computer text encodings ever produced were always flat within their code space. However, the case, we should think about the following:
- Many game text encodings are stateful. This is not the case for Unicode or any other traditional encodings including multi-byte encodings that I'm aware of. Even legacy multibyte encodings can be managed inside one flat table, i.e. Klarth's last example.
- Multi-table options, including multiple match ranges activated from one table A to another table B, are solvable and usable. It is just that we decided them to be in a polynomial complexity class.
The problems we are facing are inherent to implementations that knew the restrictions on their input. We don't necessarily have that luxury. Notice that the last point also indicates that I think the kanji array problem is solvable, however, possibly not in linear or quadratic time. I'm not so much concerned about memory, as we happen to have a lot in machines built after 2005.
I'm in favor or restricting the current release of the standard and extend the syntax upon finding a suitable solution to a suitable problem that uses multiple matches within one table other than unlimited matches or different match ranges in the same table B reachable from some table A. So I would be definitely in favor of keeping the current tablename,№Matches syntax.
Direct table fallback
I'm in favor of direct table fallback using the !7F=,-1 syntax proposed. Not having it might cause a theoretically infinite stack of tables for an infinite given string. It's a pretty big oversight and I'm happy that somebody caught it.
Algorithm selection
The current draft does not contain anything pertaining to that. So I have to ask again: We are in favor of multiple allowed insertion algorithms along with compliance levels, right? With compliance levels I mean that any implementation of the standard must indicate for instance "TFS Level 1 compliance" for implementing "longest prefix insertion" and "TFS Level 2 compliance" for implementing "longest prefix insertion and A* optimal insertion" as well as not making up new compliance levels which might be added in a revision of the TFS.
Control Codes
First of all, yes, linked entries should be renamed to make their purpose clearer.
Also, getting a "insertion test string" for grouping input arguments is simple using regular expressions (or some form of linear substitution method of your choice). For that matter, it seems we should only allow one-byte arguments to avoid Endianess issues and we can simplify the arguments to %D, %X, %B for decimal, hexadecimal and binary respectively.
<window x=%X y=%D> can be easily turned into a match string using the following substitutions (as well as substitutions to mark user-supplied {}[] etc. as literals):
- %D → [0-9]\{1,3\}
- %X → \$[0-9A-Fa-f]\{2\}
- %B → %[01]\{8\}
Notice that we do need to keep identifiers in the output, contrary to what was suggested. If we do not do that, we are open to the following exploits:
Code: [Select]
!7E=<window x=%X y=%X>
!7F=<window x=%D y=%D>How do you parse <window x=10 y=10> without additional context information? You can't. Also, we might want to include signed decimal and signed hex.
Variable parameter list
First of all, I must admit that I have never ever seen a game use these. Then again, it's not impossible. The old syntax would have lent itself rather well to defining these:
Code: [Select]
!7F=<window>, 256, 00The argument length would construe a sensible maximum argument list limit and the optional parameter after the second comma would indicate the list end identifier. Having said this, I'm not strongly in favor nor strongly opposed to include variable argument lists in any form or fashion.
Regular expression table entries
I still think they are useful, however a burden to implement and make safe with the current control code set. Their main purpose were commenting and that has been pushed to top-level utilities. For normal entries nothing is gained by this anymore. For control codes and table options, it might become a safety issue with blindly using user-controlled regex content upon output...
Multiple tables per file
Oppose because of the added complexity with almost not gain.
I hope I did not miss anything that was supposed to be addressed. Also, please mark this thread as new so other regulars get an email notification about changes.
cYa,
Tauwasser
Home
Help
Login
Register
But just for the record: I think your math is too easy and yet you still struggle with it. This is aggravating to me, because back here in good ol' Germany, we don't exactly get our degrees handed to us on a silver platter. My US high school experience also adds to that feeling, because I know that excellent and well-prepared math courses do exist in the US, yet a lot of students take three consecutive years of Geometry class where they must endure the excruciating pains of basic shapes... 







