logo
 drop

Main

Community

Submissions

Help

84277032 visitors

Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Tauwasser

Pages: [1] 2 3 4 5 6
1
Programming / Re: Table File Standard Discussion
« on: August 27, 2011, 04:51:35 pm »
Hi,

Sorry for my long absence, but real life grabbed me by the neck... So anyway, I re-read the thread from start to finish again and I'll share my thoughts with you:

  • I'm okay with comments being gone.
  • I would like to see newlines go, too.
  • End tokens can move to the utility realm.

Table matches with hex literals

I think we should count hex literals as matches for the current table in the event the table produced (or would have produced) the literal: That is to say,
  • an unlimited table would have counted an unknown byte as a fallback case and fallen back to the table below it,
  • If the stack is empty, or the table is not unlimited, the hex literal counts towards it in insertion direction.

This behavior mirrors dumping. It can produce cases that are not correct for the game's display mechanism, but since this case is ambiguous, either choice can. This would extend to table matches other than infinite and one match(es).

Restricting table matches to infinite or one match(es)

I think this is a good restriction. Even all computer text encodings ever produced were always flat within their code space. However, the case, we should think about the following:

  • Many game text encodings are stateful. This is not the case for Unicode or any other traditional encodings including multi-byte encodings that I'm aware of. Even legacy multibyte encodings can be managed inside one flat table, i.e. Klarth's last example.
  • Multi-table options, including multiple match ranges activated from one table A to another table B, are solvable and usable. It is just that we decided them to be in a polynomial complexity class.

The problems we are facing are inherent to implementations that knew the restrictions on their input. We don't necessarily have that luxury. Notice that the last point also indicates that I think the kanji array problem is solvable, however, possibly not in linear or quadratic time. I'm not so much concerned about memory, as we happen to have a lot in machines built after 2005.

I'm in favor or restricting the current release of the standard and extend the syntax upon finding a suitable solution to a suitable problem that uses multiple matches within one table other than unlimited matches or different match ranges in the same table B reachable from some table A. So I would be definitely in favor of keeping the current tablename,№Matches syntax.

Direct table fallback

I'm in favor of direct table fallback using the !7F=,-1 syntax proposed. Not having it might cause a theoretically infinite stack of tables for an infinite given string. It's a pretty big oversight and I'm happy that somebody caught it.

Algorithm selection

The current draft does not contain anything pertaining to that. So I have to ask again: We are in favor of multiple allowed insertion algorithms along with compliance levels, right? With compliance levels I mean that any implementation of the standard must indicate for instance "TFS Level 1 compliance" for implementing "longest prefix insertion" and "TFS Level 2 compliance" for implementing "longest prefix insertion and A* optimal insertion" as well as not making up new compliance levels which might be added in a revision of the TFS.

Control Codes

First of all, yes, linked entries should be renamed to make their purpose clearer.
Also, getting a "insertion test string" for grouping input arguments is simple using regular expressions (or some form of linear substitution method of your choice). For that matter, it seems we should only allow one-byte arguments to avoid Endianess issues and we can simplify the arguments to %D, %X, %B for decimal, hexadecimal and binary respectively.

<window x=%X y=%D> can be easily turned into a match string using the following substitutions (as well as substitutions to mark user-supplied {}[] etc. as literals):

  • %D → [0-9]\{1,3\}
  • %X → \$[0-9A-Fa-f]\{2\}
  • %B → %[01]\{8\}

Notice that we do need to keep identifiers in the output, contrary to what was suggested. If we do not do that, we are open to the following exploits:

Code: [Select]
!7E=<window x=%X y=%X>
!7F=<window x=%D y=%D>

How do you parse <window x=10 y=10> without additional context information? You can't. Also, we might want to include signed decimal and signed hex.

Variable parameter list

First of all, I must admit that I have never ever seen a game use these. Then again, it's not impossible. The old syntax would have lent itself rather well to defining these:

Code: [Select]
!7F=<window>, 256, 00
The argument length would construe a sensible maximum argument list limit and the optional parameter after the second comma would indicate the list end identifier. Having said this, I'm not strongly in favor nor strongly opposed to include variable argument lists in any form or fashion.

Regular expression table entries

I still think they are useful, however a burden to implement and make safe with the current control code set. Their main purpose were commenting and that has been pushed to top-level utilities. For normal entries nothing is gained by this anymore. For control codes and table options, it might become a safety issue with blindly using user-controlled regex content upon output...

Multiple tables per file

Oppose because of the added complexity with almost not gain.

I hope I did not miss anything that was supposed to be addressed. Also, please mark this thread as new so other regulars get an email notification about changes.

cYa,

Tauwasser

2
General Discussion / Re: Help Jedi with His Algebra
« on: August 11, 2011, 07:28:01 pm »
Gauss-Jordan was not an option here.  There was another problem just for that.  Believe me, I didn't prefer Gaussian Elimination. >:(

Gauss is Gauss-Jordan! The only difference is that the pivot's row (within the current sub-matrix) is only added to one less row each step into the algorithm instead of being added all other rows in all steps! Even though the Gauss algorithm was there first, there is really not much difference.

I would also like to point out that you did not perform the Gauss algorithm but the Gauss-Jordan algorithm in the first place, so don't shout at me for that! If you had used the Gauss algorithm, you would have substituted your anser for z in (3R_2 + R_1) or the one below that on the left hand side and worked from there in any of the original three equations by substituting y and z to get x. Instead, you performed further steps to get the identity of each variable in their own right, which is what the last Gauss-Jordan block in my exemplary solution does ― and which you performed as well in the lower right-hand corner. You should have stopped at the point above where I wrote basically done and substituted the known variables back for performing the original Gauss algorithm.

This is college algebra.  And simply saying 'no offense' doesn't alleviate anything you say.  It still sounds like you're trying to be offensive.

I know, but I notice a lot of Americans really like to say no offense for no particular reason and without using particularly kind words preceding or following that statement. So naturally, I thought I'd try it :) But just for the record: I think your math is too easy and yet you still struggle with it. This is aggravating to me, because back here in good ol' Germany, we don't exactly get our degrees handed to us on a silver platter. My US high school experience also adds to that feeling, because I know that excellent and well-prepared math courses do exist in the US, yet a lot of students take three consecutive years of Geometry class where they must endure the excruciating pains of basic shapes...  ::)

cYa,

Tauwasser

3
General Discussion / Re: Help Jedi with His Algebra
« on: August 11, 2011, 01:41:13 pm »
Wow, that's a perfect example of how not to do Gaussian elimination, way to go!

3-5-14-4
1-215
2-637-10

=>

01-17-19
1-215
0-235-20

=>

01-17-19
10-33-33
001-58

Basically done. You can resubstitute or do one more pivot, your choice. You'd need a calculator or be really bright either way.

010-1005
100-1947
001-58

Done. Net time to do it: 2 mins. How many questions were there?

The way you solved it is taught in eigth grade in Germany, no offense intended. Just use the Gauss-Jordan method like it was originally designed using a pivot element and you're all set, even for bigger systems and multiple solutions. Don't do that fraction stuff and writing down all the x, y, z crap. Concentrate on the important parts: the coefficients that actually determine the system. I seriously hope this is beginner's math or math for art students...

Also, the teacher was probably born on 10 May 1947 and was 58 years old when he designed the question in 2005.

cYa,

Tauwasser

4
General Discussion / Re: Help Jedi with His Algebra
« on: July 20, 2011, 05:46:25 pm »
Hi Jedi,

Have you actually ever read a text book describing synthetic/short division? Nobody in your thread seems to have the slightest clue what synthetic division actually is... So they try literally all other kinds of methods... Here goes:

Setup the chart for:

1*x³ + 2*x² - 3*x¹- 6*x⁰

Write down all coefficients for all powers of x (use zeros when the power of x is not present).

12-3-6

Good job! Now you know √(3) is a root of this function. So you know when you factor it, (x - √(3)) will be a factor!

12-3-6
x - √(3)

Now, the coefficient for the highest power of x is copied as-is:

12-3-6
x - √(3)
1

Now, the algorithm is pretty much straight-forward. You multiply the resultant number with the root, i.e. √(3) here. Then you add the resultant numbers row by row.

12-3-6
x - √(3)√(3)
12 + √(3)

then

12-3-6
x - √(3)√(3)2√(3) + 3
12 + √(3)2√(3)

then

12-3-6
x - √(3)√(3)2√(3) + 36
12 + √(3)2√(3)0

Yep, it is really a root. If it was not a root, the results wouldn't add up to 0 in the x⁻¹ power. Why is it the x⁻¹ power, you ask? Because the results will have all coefficients shifted left one power because you just factored one power out, the x in (x-√(3)). So what the above is really saying is the following:

1*x³ + 2*x² - 3*x¹- 6*x⁰ = (x - √(3)) * (x² + (2+√(3)) * x¹ + 2√(3)x⁰).

Notice how the right hand side uses the coefficients we just had in the result row for the powers of x.

Now, you can use this scheme to find the other two roots. Setup like this:
12 + √(3)2√(3)
x-k-2√(3)
10

You know that the root should yield 0 in the last coefficient, so you can deduce that the last multiplication must yield -2√(3). However, if k > 0, you will not have a sign change, because you only multiply and add positive numbers. So now you already know k < 0. It should be pretty obvious that this yields

(2 + √(3) + k) * k = -2√(3) [this is the original equation]

This has obviously two solutions (it's the original quadratic polynomial). However, the sign change gives away that k < 0 and the only two numbers you have left are actually 2 and √(3), so I think you see the point in trying either of them.

12 + √(3)2√(3)
x+√(3)-√(3)-2√(3)
120

Wow, it worked.

Now this reads:

1*x³ + 2*x² - 3*x¹- 6*x⁰ = (x - √(3)) * (x² + (2+√(3)) * x¹ + 2√(3)x⁰) = (x - √(3)) * (x + √(3)) * (x + 2).

I hope you see that the right-most coefficients are the ones we just found.

You actually did not say whether the factorization should be done using synthetic division/short division. Pretty much you will always end up with quadratic formulas when trying to plug in some unknown x which correspond to the original polynomial. However, this method is somewhat more visual in the sense that it allows one to deduce if the "next" root to solve for should actually be positive, negative and which ks one can try to yield a "simple" result in the addition. (However, often this is not the case in real-world problems. Then again, you don't solve those by hand. This graphic method is mostly useful for higher-degree polynomials, where established solving methods are numerical only.)

Also, I might mention that your problem statement lacks in the sense that any solution to the problem would always only yield the last panel from each division. Usually division steps are not done in their own panel like I did in the above example, because it would just waste time.

cYa,

Tauwasser

5
Programming / Algorithm Selection
« on: May 25, 2011, 06:40:33 pm »
New business:

I disagree with the "longest prefix" insertion rule from 2.2.4. With a slightly altered example:
12=Five
13=SixSeven
00=FiveSix
the "longest prefix" rule makes it impossible to insert the string "FiveSixSeven", despite that being valid output from a dumper. A less greedy algorithm would be able to insert "12 13" instead, which seems superior to inserting "00" but losing the "Seven".

You do? I'd love to hear your algorithm, especially if we happen to add a few more table values to the mix:

02=F
03=i
04=v
05=e
06=Fi
07=ve
08=ive

Now, what are you going insert and how are you going to determine it? ;)

That is a basic depth-first tree search, so it's not overly complicated for implementation. However, as we all know, complexity is O(b^n) where n is the maximum number of levels and b is the number of average branches. Another way to think of this is a transducer, which will naturally only finish on valid paths. The only criterion is that we need to find a shortest path, not enumerate all shortest paths.
Basically, an A* search with cost in bytes and a basic heuristic counting letters per byte will do. This could also be expanded for target language and occurrence inside the script to accomodate for the simple fact that lots of table entries doesn't mean all of them get used with the same probability. However, basic A* with cost in bytes and even heuristic of zero will work out the shortest path directly. Since the heuristic must not overestimate the byte count and already one table entry with 1 byte = more than 1 letter will basically mean that letter per bytes < 1, we deal with a (0,1) bytes per letter range of possible values here, really, so this means even an ideal heuristic for the source file will have little impact on finding the right way. The only nitpick here is, that possibly normal entries like 3 bytes = 1 letter exist and therefore letter per byte would be > 1 on average (when entry probability is not calculated). However, since a heuristic giving 0.1 bytes per letter will still be permissible, because it doesn't overestimate, the (first) shortest path will still be found.

Code: [Select]
01=F
02=i
03=v
04=e
05=Fi
06=ve
07=ive
08=iveS
09=ixSeven
10=Six
11=Seven

See PDF.

Cost is 1.2 at the start with h(x) = 0.1 * x where x is letters left in buffer. Simple mean for bytes per letter would be 11/30 (30 letters per 11 bytes), so we're somewhat far away from even close to optimal and we will see some jumping towards the end because of this.
  • We expand the first node (red), find that "F" and "Fi" can be used. "F"'s path cost d(x) = 1 byte and we estimate another h(x) = 1.1 bytes left to go => 2.1 total. "Fi" only costs f(x) = 2.0 per this rationale [10 letters to go at 0.1 plus 1 byte already used].
  • Expand "Fi". Find "ve" (cost 2.8) and "v" (cost 2.9).
  • Expand "F" (2.1), find "i" (3.0), "ive" (2.8), "iveS" (2.7).
  • Expand "iveS", find "ixSeven" (3.0) [algorithms halting on h(x) = 0 will stop now, orange], i (3.6).
  • (Assume we haven't stopped, black), expand any other node and find that it's cost (because d(x) >= 3 at this point) is greater than 3.0.
  • This goes on until all 2.x nodes and one 3.0 node (depending on implementation) have been expanded once.
  • Expand "ixSeven" (green) and notice it's our goal.

Before you ask, NC, this can be programmed with a simple stack that pushes node IDs back with reordering or a list that is sorted after every insertion of node(s).

I hope to have demonstrated that this is neither a laughable nor impossible claim or problem. However, having an admissible heuristic here is key and a simple (unweighed) mean will most likely not do because of outliers and one would need to use median or mode or some analysis of input first. Worst case is h(x) = 0 for all x, so you get depth-first search or h(x) = x * [lengths of longest hex sequence in table] which yields breadth-first search.

In the absence of table switching, I would model my insertion algorithm [...] on perl's regular expression engine. [...] the engine is generally longest prefix, but adds the concept of backtracking - if it's not possible to complete tokenizing the string based on your current tokenization, go back a little bit and try a different tokenization.

This is basically A* over tokens with just text length like all regex engines do for greedy star. However, I'm currently not aware that how this would work with the added impetus of having multiple bytes for some tokens thus changing the cost from same cost for all tokenizations to cost per tokenized token.
Having said that, a way to implement this via general purpose regex engines would probably be more accessible in more programming languages.

You've just increased complexity by 10x for an otherwise trivial task.

Neither tokenization not optimal search are trivial tasks. Indeed, defining the search problem itself mathematically is not a trivial task. That's why so much brain power (and money) went into things like SQL queries and the like.

Has anybody ever actually written an inserter that behaves like this?

I'm not sure what you're thinking of as an inserter in this case, but pretty much every compression that does not use sliding window techniques will have to use a backtracking algorithm to be optimal alongside a good heuristic. So yes, people have written inserters that insert input data into an output file while contemplating size and combination of mappings. It might just have been for binary data or the like.

Ran this one by Klarth. "Bad token selection can occur sometimes, but I'd estimate it very rare for it to be detrimental...unless it's a "gotcha table". The optimal algorithm is simply out of reach for most, and non desirable for the rest of us. Just because it may be more optimal doesn't mean it's desirable or the best choice for the job.

Not sure where the quote ends, so I assume it's all Klarth'. If so, I'd like to see him defend the POV that an optimal algorithm is non-desirable, because I cannot think of a single argument except burden of implementation, which he ruled out. Once a provably optimal algorithm is used, why is it not desirable? Speed-wise we can do brute force of some 100 kB in a few seconds, so speed doesn't seem to be the issue here, does it?



After getting bored waiting for a single medium-length string to encode, I ended up abandoning the longest prefix idea altogether and created a different optimal insertion algorithm that runs in roughly linear time and memory instead. It took a couple hours to get everything working right, but the end result is only about 50 lines of code and it chews through a 200KB script in under a second.

I'm quite interested how you achieved and proved linear time/memory. Please don't hesitate to elaborate.

While on the topic of insertion algorithms... how are you handling table switching? I've been thinking more about this, and in the general case it presents an even larger can of worms than I was anticipating.

Since I'm the guy that introduced this, I can only say that ― yes ― it does pose a problem. However, my naïve solution was to use ordered data sets and do a longest-prefix over all tables. You can still do this with A*, where path cost is adapted to table switching as well, i.e. the path to an entry with switched table from another table is the cost in bytes in the new table plus the cost in bytes from the switching code in the old table. This of course, needs just some bookkeeping to know for each explored node in which table it belongs and how many matches the table can have before switching back.

Since I thought about it some time now, I would actually like to introduce the notion of leveled compliance. Basically, one could incorporate into the standard more than one notion of insertion whereas tools can then specify which version(s) they comply to and by. This way, tool x can comply to a) longest prefix, b) the proposed A* method (or maybe just a depth first etc.) and c) it's own method. IMHO, this would also preclude utility authors inventing their own insertion technique but not indicating it anywhere. Table files themselves would be required to stay the way they are for compatibility to other tools once the final draft is done. Versioning will likely happen anyway once some <glaring oversight that really should have been caught and everybody feels miserable about once the standard is out> has been identified and rectified in a subsequent version of the standard.

cYa,

Tauwasser

6
Programming / Some Basic Points
« on: May 25, 2011, 06:27:51 pm »
First of all, this post will be pretty long and I would like to apologize to abw if this seems like I jump on his posts only. I originally discussed the table file standard with NC back on his own board and we pretty much figured out a way to do it. I'm also quite late to the party, which is why I cover almost every second point you make here.

except for post-processing [for table switching] of overlapping strings with bytes interpreted by different tables, which is still a mess

Can you elaborate on this one?

perhaps text sequences containing <$[0-9A-Fa-f][0-9A-Fa-f]> should be forbidden.

While developing the standard, we opted for more simple text checking algorithms, so we decided to give the user the power to do this at the cost of possible mix-ups. NC tried to tone complexity down as much as possible. Even regular expressions are considered a hindrance here, which I will elaborate on further down below. However, I would support and have already offered to design regular expressions for identifying entry types, so we might as well disallow it.

Also, it might be more appropriate to include the section on inserting raw hex in 2.2.4 instead of in 2.2.1.

It seemed logical to include it there, because that way, the dumping and insertion process for hexadecimal literals is completely defined, instead of breaking these two up. 2.2.4 doesn't deal with literals at all right now.

Also also, it might be worth mentioning that hex characters are equally valid in upper or lower case (e.g. "AB" == "ab" =="Ab" == "aB").

This is addressed in 2.2.1 in the following manner:

Code: [Select]
"XX" will represent the raw hex byte consisting of digits [0-9a-fA-F].
Control Codes in End Tokens Requiring Hex Representation

I admit having these commenting operations and dumped and to-be-inserted text in the same file always irked me and I personally handle it differently, i.e. no line breaks and no content mixing.
I felt like this was an atlas-specific hack and easily remedied by the user taking action after dumping. However, I also felt that done properly, one could easily address this with regular expression grouping and get user-customizable behavior. However, regular expressions are considered "one step too far" right now.

When inserting, should control codes be ignored for all tokens, or just end tokens requiring hex representation?

The only control codes that are currently implemented ― and again in a fashion that makes for the inability to implement literal "\n" instead of control codes '\n' for simplicity ― are line breaks and are as per 2.3 to be ignored by insertion tools:

Code: [Select]
These codes are used by dumpers only and will be ignored by inserters.
An additional burden here is the different line end control codes used in different OS. Basically, we might have 0x0D 0x0A, or 0x0A, or 0x0D. This also favors completely ignoring line ends, because it cannot be assured that some text editing program doesn't silently convert from dumping standard to OS standard so the insertion tool would not find the proper bytes in the text file.
On the other hand, "OS-independent" ReadLine functions do exist and will worst case, read two lines instead of one for 0x0D 0x0A. Therefore, by ignoring the number of line breaks and empty lines, we actually gain a little bit of stability here.

The standard makes no definition of what constitutes a "name".

This should currently match [^,]*, i.e. any string that does not contain a comma. I would be willing to settle for [0-9A-Za-z]* in the light of not wanting to deal with Unicode confusables or different canonical decompositions of accented letters etc.

As for uniqueness of labels, I have to admit I was silently going for uniqueness in each type, but this might have to be discussed again.

While on the topic of uniqueness, it might be worth including a note in the standard that restricts the definition of a "unique" entry to the current logical table.

Good point :)

Eh? Oh. Hmm. Somehow I was under the impression that the standard supported multiple logical tables within the same physical file. At the very least, it isn't explicitly forbidden, and I see no difficulty in supporting it (having already done so myself) if we impose two more constraints:
1) a table's ID line must precede any entries for that table
2) if a file contains multiple tables, every table in that file after the first table must have an ID line
(in fact I require an ID for every table used, but I don't feel that condition should be imposed as part of the standard)

I think this did not occur to anybody, simple because one file per table and one table per file is the way it has always been. I feel we should leave it that way and be specific about it.

Linked Entries
Attempting to insert a linked entry which lacks its corresponding <$XX> in the insert script should generate an error under 4.2, right?

Exactly. It would basically be a sequence that cannot be inserted according to general insertion rules. Like for instance "福" cannot be inserted when it is not defined in any table.

Under the "wishing for the moon" category:
It would be nice if we could define linked entries in such a way that we could obtain output with both pre- and post-fix
[...]

[W]e lack a general escape character, you couldn't determine which commas were field delimiters and which were data :(. It might also be nice to be able to specify how raw bytes were output in case you don't want the <$XX> format.

Both of these ideas were sacrificed for the sake of simplicity :(

So if instead of "A<Color><$A4><$34>a" we only have "A<Color><$A4>a", you'd want to insert "A0 E0 A4 21"? How does this interact with table switching? Continuing my table switching example, what does the insertion process look like when instead of "<Color><$25><$25>C1", we're trying to insert "<Color><$25>C1" starting with table1?

I couldn't find your Color example. Nevertheless, parsing of this would become a search for a normal entry, because a linked entry cannot be found. If a normal entry exists, the whole thing is obviously not ambiguous and insertion will progress as one would expect. On the other hand, when no normal entry can be found for the label that was inadvertently misplaced, the insertion tool will not know what to do and should produce an error.

Ideally, if I dump a bunch of hex to text and then reinsert that text unaltered, my basic expectation is to get the original hex again.

This expectation is a false premise, really. You open an image file saved with one program and save a copy in another program. You will likely find that the compression changed, or the picture was silently up-converted from 16bit rgb to 32bit rgb etc. What matters in the end is that the final result is equivalent. Now in the case of text, this means the following:

You dump text and insert it back. It should display as the same thing when the game is running. When it does not ― barring edge cases we explicitly forbade, such as <$xx> ― it just means your table file matches two entries to the same text which is really not the same text. A common example would be having two different text engines that happen to share parts of their encoding (or were even built from one encoding!). In such cases, 0x00 might be displayed as "string" in engine A and entry 0x80 might be displayed as "test" in engine B. However, as long as the rendering of the other code fails in one engine, the strings must not be considered the same to begin with, because they are not. In such cases, dumping text per engine might be an option or marking entries with custom tags etc.

7
Personal Projects / Re: The Console Tool (by Low Lines)
« on: November 24, 2010, 09:18:56 am »
I find it quite unlikely they figured out the format. They probably took them from this topic (or a page with all the GIFs from that topic), or some other source of manually ripped animations.

Indeed loadingNOW figures the format out and had to crack the key they are encrypted with, or so I heard. http://pokeguide.filb.de/

cYa,

Tauwasser

8
Looks like the designation for the color is embedded somewhere within the tile.

Umm, excuse me Captain Obvious, but this is what graphics are all about. Encoding color information... It is this very "designation for the color" which you are altering all along.

I'm editing the titlescreen for Monster Capsule GB and when I was done doing some work on it, I loaded it into Rew and there was some new problem which wasn't there before.


Your description doesn't match what you did at all... I dug out the original title screen (left) and marked some tiles (right):



Notice how the tiles in the region I outlined in red are getting another palette assigned than the rest of the tiles that make up the word モンスターカプセル. This is evident by the different shade of yellow/brown the red arrow is pointing at. In order for you to employ this seemingly free-to-use space, you will have to find out how the palettes get assigned for the title screen and change the outlined tiles' palette association to the one you want. This is nothing you did "wrong" while overwriting the tiles in Tile Molester, just to clarify.

cYa,

Tauwasser

9
Script Help and Language Discussion / Re: Japanese Translation Questions
« on: November 14, 2010, 08:50:18 pm »
You opted for the wrong なる. There are the following words to be found in a casual dictionary look-up:

  • なる (成る; 為る)
  • なる (生る)
  • 鳴る (なる)

This looks like a case of the first one.

cYa,

Tauwasser

10
Personal Projects / Re: Romance of Forgotten Kingdom (DOS) Translation
« on: November 03, 2010, 05:48:29 pm »
I'd like to keep the Chinese characters, but have the title below and the company name in English, all in fonts that fit in. Maybe also a translation note next to the company logo. The image has to be edited in PCX format. Please send me a PM if interested.

You should probably provide translations for anybody who feels up to the challenge and cross-post in Screenshots.

cYa,

Tauwasser

11
Then you got the opposite of me. Works on bgb with accuracy enabled, broken on VBA.

You did not enable accuracy. While there is an accuracy option in the settings, it just emulates timing more accurate. The real options are in the exceptions menu and should all be set to emulate "as in reality". You will then see the real issues.
However, the accuracy settings will not fix the problem with banks 0x22 and onwards jumping to bank 0x00 after vblank.

Well it seems I can't download either the patch or the readme from here, but you can get the file from my website which includes both a readme and a walkthrough: http://jaytheham.com/ntknb/
I just tested the patch from my site with BGB and it works flawlessly.

Well the issues are still there, as it's exactly the same patch. The readme doesn't seem to be included in the archive on RHDN. While we're at it, you should convert both the readme and the walkthrough to UTF-8.

cYa,

Tauwasser

12
The patch doesn't seem to work. It overwrites all rombank numbers at 0x4000 from rombanks 0x22 on it seems. With that fixed vblank code won't switch back to the wrong rom bank. However, the text is overwriting itself and looking very messed up in the naming screen. You probably do something wrong in your code that the hardware doesn't support or you expect to work differently.

I tested in BGB with the accurate emulation settings on. Here's what I get:



I hope you can fix that so it will be a fully working release :) Also: No readme? What's up with that?

cYa,

Tauwasser

13
We discussed this in the past. The biggest issue is as you brought up too, getting permission from the authors. I'm sure 9 out of 10 wouldn't mind, but I can already smell the drama the remaining ones would raise. :'(

Exactly. And this fear is not ill-founded. Suppose you were a hacker that mastered a patch. Now some guy comes along and reworks it into a different format. For whatever reason the patch stopped working correctly. Who's getting the blame of putting a corrupted patch out there? Who has to deal with newbies complaining that it just doesn't work? Right, the original author.

Now you say MD5 is gonna save the day. In theory this is all good, however, once some author didn't specify the correct rom to use, or maybe specified it even incorrectly, the guy converting the patch will have to find out and make sure. Of course, it can be argued that the community would only benefit from this, however, it's a hellish job for one single person and sure enough, some small mishap will slip in and then the whole converted patch set can be questioned and you're back to people relying on the old patch format.

Well, I was hoping that since the patch does the same thing, it wouldn't count as a derivative work [...].

Well, it's not so much a derivative work, as it's not protected by copyright of any kind, since it's an open-source patch format and the information contained in it is most likely not the IP of the patch creator ― but may be the IP of the creator of the game to be patched. Also, I edited many readme's and re-uploaded the whole archives to convert them to UTF-8. Nobody made a big deal out of it then, because this was basically a service to the community, so everybody can read the readme's. I doubt it would stir up much feelings at first, yet, when one case is discovered that angers some hacker, you will probably have half the community in your back.

However, say for instance one hacker has something in particular against xdelta and doesn't like his work distributed that way ― much the same as distributing pre-patched roms is frowned upon ― who's right and who's wrong in that case? Surely, I personally think you could do whatever with my patches, as it's out of my control, but I wouldn't exactly like you to convert them from a higher format to a lower format (ergo one without checksums etc).

Also, you haven't quite brought anything new up:

(such as expansion) where a soft ips patch would fail.

IPS patches can handle rom expansion quite well. They can't handle more than 24bit addresses though.

Plus it contains the checksum of itself and the original rom.

So do NINJA-format patches. Why weren't they adopted, although the format has been out for some years now?

Also, one of the many requirements of patches that checksums actually don't address right now is the need for cross-patching for many hacks. For instance, Super Mario Hacks may be patched incrementally, one patch may change the background mechanism, while another changes objects. As long as the offsets for code don't overlap, everything will work fine and actually as expected by the user! Tools that display warning messages because of non-matching checksums will just confuse the user here!

Identical. I think there would be tremendous bandwidth savings, plus reduced headaches for end users.

Well, bandwidth isn't really a problem for most patches hosted by this site, I'd argue. The really big patches for CD-based games etc aren't even hosted here. At least I can remember one distinctive news entry directing to another site because the patch was in the  whopping hundreds of megabytes.

I'd say the community stayed at IPS for such a long time because it meets the requirements of this community. Most games don't need addressing of more than 24bits' width. Also, many people don't want checksum errors to pop up every so often, because it's detrimental to the whole process of cross-patching.
Of course, hacking work for newer platforms can't rely on IPS anymore, but this site doesn't handle that use-case. Also, I believe most CD-based patches already use xdelta, because it was per-sé standard when hacking those console games began.

cYa,

Tauwasser

14
Front Page News / Re: Site: ROMhacking.net and Data Crystal Merge!
« on: September 10, 2010, 04:36:00 pm »
Should be fixed and everyone can relax as "Pokémon" now displays correctly again!

Finally! Now I shall find other, more interesting ways to break this :] :D

cYa,

Tauwasser

15
Personal Projects / Re: The Console Tool (by Low Lines)
« on: September 01, 2010, 08:08:12 am »
DirectX isn't platform-independent, so it probably won't happen.

cYa,

Tauwasser

16
The new one has italics while the old one doesn't. He didn't overwrite the file, the paths are different while the file names are indeed the same.
I personally feel that the new one is better. Now it just needs proper holes in the As. Just compare the A to R. Usually the hole in R should be only a slight bit bigger if, not like that.

cYa,

Tauwasser

17
Personal Projects / Re: Code Naturalizer
« on: August 29, 2010, 03:23:23 pm »
Couldn't a tiny bit of devious coding break this entire method?  Like this example (it's GB asm, but applies universally):

from http://web.archive.org/web/20070807121547/http://www.bripro.com/low/obscure/index.php?page=hko_sm3s:
Quote
Additionally, they do other tricks using jump opcodes which will fool even BGB's disassembly. However, stepping through and using an interactive disassembler as reference on the side, it only slows down hacking, it could never stop it!

Here's an example:
How the disassembler sees it:
 1234: xx xx  jr 1237
 1236: xx xx  (a data byte of an opcode with operands)
 1238: <error invalid opcode>
How the real code is:
 1234: xx xx  jr 1237
 1236: xx     .db $xx
 1237: xx xx  (a valid opcode)

Sorry to disappoint you, but this is neither devious coding or trying to fool disassemblers. Z80gb has varying opcode widths and this will usually work out that code doesn't necessarily start aligned with the preceding code. It's a problem in the disassembly logic of BGB that makes it unable to display code starting from an offset it considers invalid or interpreted differently, even when specifying the offset by hand! As long as the code naturalizer only follows jump and interprets pointers and doesn't try to automatically disassemble all code, you will be fine, because the actual jumps will go to 0x1237.

cYa,

Tauwasser

18
Site Talk / Re: Back button kills a post in progress.
« on: August 25, 2010, 10:36:30 pm »
This would be up to the Browser. My Firefox does indeed keep form contents, however, not on dynamically created forms. It's probably an option somewhere, try about:config :)

cYa,

Tauwasser

19
Personal Projects / Re: Code Naturalizer
« on: August 20, 2010, 04:46:59 pm »
I thought your approach was to make hacking easier. I understood your initial post as if it would be going from ASM to naturalized code and then back in order for the hacker to tinker with the naturalized code.

IMO just naturalizing isn't worth programming. Opcodes are there for a reason: They are concise and exact. Anybody with half a brain will be able to read them just as easily ― if not easier ― than your naturalized code. Simply because a defined and concise syntax is worth its money. Already three lines of your naturalized code will become incomprehensible and unreadable.

So basically, you want to achieve a way to output ASM code in a naturalized way to help people understand it. It's not worth the effort and nobody will be able to properly learn from it anyway. Abstraction is the way to go. Understanding what a piece of code does is not the sum of its parts. Just look at the old threads Rai had going on ― sometimes as many as three for the same piece of code/topic ― and he understood all opcodes, yet couldn't find the abstract routine they implemented or what they accomplished.

cYa,

Tauwasser

20
Personal Projects / Re: Code Naturalizer
« on: August 20, 2010, 01:30:59 pm »
Bad example, as it doesn't translate back into ASM.

cYa,

Tauwasser

Pages: [1] 2 3 4 5 6