News: 11 March 2016 - Forum Rules, Mobile Version
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia

Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Tauwasser

Pages: [1] 2 3 4 5
Programming / Re: Table File Standard Discussion
« on: August 27, 2011, 04:51:35 pm »

Sorry for my long absence, but real life grabbed me by the neck... So anyway, I re-read the thread from start to finish again and I'll share my thoughts with you:

  • I'm okay with comments being gone.
  • I would like to see newlines go, too.
  • End tokens can move to the utility realm.

Table matches with hex literals

I think we should count hex literals as matches for the current table in the event the table produced (or would have produced) the literal: That is to say,
  • an unlimited table would have counted an unknown byte as a fallback case and fallen back to the table below it,
  • If the stack is empty, or the table is not unlimited, the hex literal counts towards it in insertion direction.

This behavior mirrors dumping. It can produce cases that are not correct for the game's display mechanism, but since this case is ambiguous, either choice can. This would extend to table matches other than infinite and one match(es).

Restricting table matches to infinite or one match(es)

I think this is a good restriction. Even all computer text encodings ever produced were always flat within their code space. However, the case, we should think about the following:

  • Many game text encodings are stateful. This is not the case for Unicode or any other traditional encodings including multi-byte encodings that I'm aware of. Even legacy multibyte encodings can be managed inside one flat table, i.e. Klarth's last example.
  • Multi-table options, including multiple match ranges activated from one table A to another table B, are solvable and usable. It is just that we decided them to be in a polynomial complexity class.

The problems we are facing are inherent to implementations that knew the restrictions on their input. We don't necessarily have that luxury. Notice that the last point also indicates that I think the kanji array problem is solvable, however, possibly not in linear or quadratic time. I'm not so much concerned about memory, as we happen to have a lot in machines built after 2005.

I'm in favor or restricting the current release of the standard and extend the syntax upon finding a suitable solution to a suitable problem that uses multiple matches within one table other than unlimited matches or different match ranges in the same table B reachable from some table A. So I would be definitely in favor of keeping the current tablename,№Matches syntax.

Direct table fallback

I'm in favor of direct table fallback using the !7F=,-1 syntax proposed. Not having it might cause a theoretically infinite stack of tables for an infinite given string. It's a pretty big oversight and I'm happy that somebody caught it.

Algorithm selection

The current draft does not contain anything pertaining to that. So I have to ask again: We are in favor of multiple allowed insertion algorithms along with compliance levels, right? With compliance levels I mean that any implementation of the standard must indicate for instance "TFS Level 1 compliance" for implementing "longest prefix insertion" and "TFS Level 2 compliance" for implementing "longest prefix insertion and A* optimal insertion" as well as not making up new compliance levels which might be added in a revision of the TFS.

Control Codes

First of all, yes, linked entries should be renamed to make their purpose clearer.
Also, getting a "insertion test string" for grouping input arguments is simple using regular expressions (or some form of linear substitution method of your choice). For that matter, it seems we should only allow one-byte arguments to avoid Endianess issues and we can simplify the arguments to %D, %X, %B for decimal, hexadecimal and binary respectively.

<window x=%X y=%D> can be easily turned into a match string using the following substitutions (as well as substitutions to mark user-supplied {}[] etc. as literals):

  • %D → [0-9]\{1,3\}
  • %X → \$[0-9A-Fa-f]\{2\}
  • %B → %[01]\{8\}

Notice that we do need to keep identifiers in the output, contrary to what was suggested. If we do not do that, we are open to the following exploits:

Code: [Select]
!7E=<window x=%X y=%X>
!7F=<window x=%D y=%D>

How do you parse <window x=10 y=10> without additional context information? You can't. Also, we might want to include signed decimal and signed hex.

Variable parameter list

First of all, I must admit that I have never ever seen a game use these. Then again, it's not impossible. The old syntax would have lent itself rather well to defining these:

Code: [Select]
!7F=<window>, 256, 00
The argument length would construe a sensible maximum argument list limit and the optional parameter after the second comma would indicate the list end identifier. Having said this, I'm not strongly in favor nor strongly opposed to include variable argument lists in any form or fashion.

Regular expression table entries

I still think they are useful, however a burden to implement and make safe with the current control code set. Their main purpose were commenting and that has been pushed to top-level utilities. For normal entries nothing is gained by this anymore. For control codes and table options, it might become a safety issue with blindly using user-controlled regex content upon output...

Multiple tables per file

Oppose because of the added complexity with almost not gain.

I hope I did not miss anything that was supposed to be addressed. Also, please mark this thread as new so other regulars get an email notification about changes.



Programming / Algorithm Selection
« on: May 25, 2011, 06:40:33 pm »
New business:

I disagree with the "longest prefix" insertion rule from 2.2.4. With a slightly altered example:
the "longest prefix" rule makes it impossible to insert the string "FiveSixSeven", despite that being valid output from a dumper. A less greedy algorithm would be able to insert "12 13" instead, which seems superior to inserting "00" but losing the "Seven".

You do? I'd love to hear your algorithm, especially if we happen to add a few more table values to the mix:


Now, what are you going insert and how are you going to determine it? ;)

That is a basic depth-first tree search, so it's not overly complicated for implementation. However, as we all know, complexity is O(b^n) where n is the maximum number of levels and b is the number of average branches. Another way to think of this is a transducer, which will naturally only finish on valid paths. The only criterion is that we need to find a shortest path, not enumerate all shortest paths.
Basically, an A* search with cost in bytes and a basic heuristic counting letters per byte will do. This could also be expanded for target language and occurrence inside the script to accomodate for the simple fact that lots of table entries doesn't mean all of them get used with the same probability. However, basic A* with cost in bytes and even heuristic of zero will work out the shortest path directly. Since the heuristic must not overestimate the byte count and already one table entry with 1 byte = more than 1 letter will basically mean that letter per bytes < 1, we deal with a (0,1) bytes per letter range of possible values here, really, so this means even an ideal heuristic for the source file will have little impact on finding the right way. The only nitpick here is, that possibly normal entries like 3 bytes = 1 letter exist and therefore letter per byte would be > 1 on average (when entry probability is not calculated). However, since a heuristic giving 0.1 bytes per letter will still be permissible, because it doesn't overestimate, the (first) shortest path will still be found.

Code: [Select]

See PDF.

Cost is 1.2 at the start with h(x) = 0.1 * x where x is letters left in buffer. Simple mean for bytes per letter would be 11/30 (30 letters per 11 bytes), so we're somewhat far away from even close to optimal and we will see some jumping towards the end because of this.
  • We expand the first node (red), find that "F" and "Fi" can be used. "F"'s path cost d(x) = 1 byte and we estimate another h(x) = 1.1 bytes left to go => 2.1 total. "Fi" only costs f(x) = 2.0 per this rationale [10 letters to go at 0.1 plus 1 byte already used].
  • Expand "Fi". Find "ve" (cost 2.8) and "v" (cost 2.9).
  • Expand "F" (2.1), find "i" (3.0), "ive" (2.8), "iveS" (2.7).
  • Expand "iveS", find "ixSeven" (3.0) [algorithms halting on h(x) = 0 will stop now, orange], i (3.6).
  • (Assume we haven't stopped, black), expand any other node and find that it's cost (because d(x) >= 3 at this point) is greater than 3.0.
  • This goes on until all 2.x nodes and one 3.0 node (depending on implementation) have been expanded once.
  • Expand "ixSeven" (green) and notice it's our goal.

Before you ask, NC, this can be programmed with a simple stack that pushes node IDs back with reordering or a list that is sorted after every insertion of node(s).

I hope to have demonstrated that this is neither a laughable nor impossible claim or problem. However, having an admissible heuristic here is key and a simple (unweighed) mean will most likely not do because of outliers and one would need to use median or mode or some analysis of input first. Worst case is h(x) = 0 for all x, so you get depth-first search or h(x) = x * [lengths of longest hex sequence in table] which yields breadth-first search.

In the absence of table switching, I would model my insertion algorithm [...] on perl's regular expression engine. [...] the engine is generally longest prefix, but adds the concept of backtracking - if it's not possible to complete tokenizing the string based on your current tokenization, go back a little bit and try a different tokenization.

This is basically A* over tokens with just text length like all regex engines do for greedy star. However, I'm currently not aware that how this would work with the added impetus of having multiple bytes for some tokens thus changing the cost from same cost for all tokenizations to cost per tokenized token.
Having said that, a way to implement this via general purpose regex engines would probably be more accessible in more programming languages.

You've just increased complexity by 10x for an otherwise trivial task.

Neither tokenization not optimal search are trivial tasks. Indeed, defining the search problem itself mathematically is not a trivial task. That's why so much brain power (and money) went into things like SQL queries and the like.

Has anybody ever actually written an inserter that behaves like this?

I'm not sure what you're thinking of as an inserter in this case, but pretty much every compression that does not use sliding window techniques will have to use a backtracking algorithm to be optimal alongside a good heuristic. So yes, people have written inserters that insert input data into an output file while contemplating size and combination of mappings. It might just have been for binary data or the like.

Ran this one by Klarth. "Bad token selection can occur sometimes, but I'd estimate it very rare for it to be detrimental...unless it's a "gotcha table". The optimal algorithm is simply out of reach for most, and non desirable for the rest of us. Just because it may be more optimal doesn't mean it's desirable or the best choice for the job.

Not sure where the quote ends, so I assume it's all Klarth'. If so, I'd like to see him defend the POV that an optimal algorithm is non-desirable, because I cannot think of a single argument except burden of implementation, which he ruled out. Once a provably optimal algorithm is used, why is it not desirable? Speed-wise we can do brute force of some 100 kB in a few seconds, so speed doesn't seem to be the issue here, does it?

After getting bored waiting for a single medium-length string to encode, I ended up abandoning the longest prefix idea altogether and created a different optimal insertion algorithm that runs in roughly linear time and memory instead. It took a couple hours to get everything working right, but the end result is only about 50 lines of code and it chews through a 200KB script in under a second.

I'm quite interested how you achieved and proved linear time/memory. Please don't hesitate to elaborate.

While on the topic of insertion algorithms... how are you handling table switching? I've been thinking more about this, and in the general case it presents an even larger can of worms than I was anticipating.

Since I'm the guy that introduced this, I can only say that ― yes ― it does pose a problem. However, my naïve solution was to use ordered data sets and do a longest-prefix over all tables. You can still do this with A*, where path cost is adapted to table switching as well, i.e. the path to an entry with switched table from another table is the cost in bytes in the new table plus the cost in bytes from the switching code in the old table. This of course, needs just some bookkeeping to know for each explored node in which table it belongs and how many matches the table can have before switching back.

Since I thought about it some time now, I would actually like to introduce the notion of leveled compliance. Basically, one could incorporate into the standard more than one notion of insertion whereas tools can then specify which version(s) they comply to and by. This way, tool x can comply to a) longest prefix, b) the proposed A* method (or maybe just a depth first etc.) and c) it's own method. IMHO, this would also preclude utility authors inventing their own insertion technique but not indicating it anywhere. Table files themselves would be required to stay the way they are for compatibility to other tools once the final draft is done. Versioning will likely happen anyway once some <glaring oversight that really should have been caught and everybody feels miserable about once the standard is out> has been identified and rectified in a subsequent version of the standard.



Programming / Some Basic Points
« on: May 25, 2011, 06:27:51 pm »
First of all, this post will be pretty long and I would like to apologize to abw if this seems like I jump on his posts only. I originally discussed the table file standard with NC back on his own board and we pretty much figured out a way to do it. I'm also quite late to the party, which is why I cover almost every second point you make here.

except for post-processing [for table switching] of overlapping strings with bytes interpreted by different tables, which is still a mess

Can you elaborate on this one?

perhaps text sequences containing <$[0-9A-Fa-f][0-9A-Fa-f]> should be forbidden.

While developing the standard, we opted for more simple text checking algorithms, so we decided to give the user the power to do this at the cost of possible mix-ups. NC tried to tone complexity down as much as possible. Even regular expressions are considered a hindrance here, which I will elaborate on further down below. However, I would support and have already offered to design regular expressions for identifying entry types, so we might as well disallow it.

Also, it might be more appropriate to include the section on inserting raw hex in 2.2.4 instead of in 2.2.1.

It seemed logical to include it there, because that way, the dumping and insertion process for hexadecimal literals is completely defined, instead of breaking these two up. 2.2.4 doesn't deal with literals at all right now.

Also also, it might be worth mentioning that hex characters are equally valid in upper or lower case (e.g. "AB" == "ab" =="Ab" == "aB").

This is addressed in 2.2.1 in the following manner:

Code: [Select]
"XX" will represent the raw hex byte consisting of digits [0-9a-fA-F].
Control Codes in End Tokens Requiring Hex Representation

I admit having these commenting operations and dumped and to-be-inserted text in the same file always irked me and I personally handle it differently, i.e. no line breaks and no content mixing.
I felt like this was an atlas-specific hack and easily remedied by the user taking action after dumping. However, I also felt that done properly, one could easily address this with regular expression grouping and get user-customizable behavior. However, regular expressions are considered "one step too far" right now.

When inserting, should control codes be ignored for all tokens, or just end tokens requiring hex representation?

The only control codes that are currently implemented ― and again in a fashion that makes for the inability to implement literal "\n" instead of control codes '\n' for simplicity ― are line breaks and are as per 2.3 to be ignored by insertion tools:

Code: [Select]
These codes are used by dumpers only and will be ignored by inserters.
An additional burden here is the different line end control codes used in different OS. Basically, we might have 0x0D 0x0A, or 0x0A, or 0x0D. This also favors completely ignoring line ends, because it cannot be assured that some text editing program doesn't silently convert from dumping standard to OS standard so the insertion tool would not find the proper bytes in the text file.
On the other hand, "OS-independent" ReadLine functions do exist and will worst case, read two lines instead of one for 0x0D 0x0A. Therefore, by ignoring the number of line breaks and empty lines, we actually gain a little bit of stability here.

The standard makes no definition of what constitutes a "name".

This should currently match [^,]*, i.e. any string that does not contain a comma. I would be willing to settle for [0-9A-Za-z]* in the light of not wanting to deal with Unicode confusables or different canonical decompositions of accented letters etc.

As for uniqueness of labels, I have to admit I was silently going for uniqueness in each type, but this might have to be discussed again.

While on the topic of uniqueness, it might be worth including a note in the standard that restricts the definition of a "unique" entry to the current logical table.

Good point :)

Eh? Oh. Hmm. Somehow I was under the impression that the standard supported multiple logical tables within the same physical file. At the very least, it isn't explicitly forbidden, and I see no difficulty in supporting it (having already done so myself) if we impose two more constraints:
1) a table's ID line must precede any entries for that table
2) if a file contains multiple tables, every table in that file after the first table must have an ID line
(in fact I require an ID for every table used, but I don't feel that condition should be imposed as part of the standard)

I think this did not occur to anybody, simple because one file per table and one table per file is the way it has always been. I feel we should leave it that way and be specific about it.

Linked Entries
Attempting to insert a linked entry which lacks its corresponding <$XX> in the insert script should generate an error under 4.2, right?

Exactly. It would basically be a sequence that cannot be inserted according to general insertion rules. Like for instance "福" cannot be inserted when it is not defined in any table.

Under the "wishing for the moon" category:
It would be nice if we could define linked entries in such a way that we could obtain output with both pre- and post-fix

[W]e lack a general escape character, you couldn't determine which commas were field delimiters and which were data :(. It might also be nice to be able to specify how raw bytes were output in case you don't want the <$XX> format.

Both of these ideas were sacrificed for the sake of simplicity :(

So if instead of "A<Color><$A4><$34>a" we only have "A<Color><$A4>a", you'd want to insert "A0 E0 A4 21"? How does this interact with table switching? Continuing my table switching example, what does the insertion process look like when instead of "<Color><$25><$25>C1", we're trying to insert "<Color><$25>C1" starting with table1?

I couldn't find your Color example. Nevertheless, parsing of this would become a search for a normal entry, because a linked entry cannot be found. If a normal entry exists, the whole thing is obviously not ambiguous and insertion will progress as one would expect. On the other hand, when no normal entry can be found for the label that was inadvertently misplaced, the insertion tool will not know what to do and should produce an error.

Ideally, if I dump a bunch of hex to text and then reinsert that text unaltered, my basic expectation is to get the original hex again.

This expectation is a false premise, really. You open an image file saved with one program and save a copy in another program. You will likely find that the compression changed, or the picture was silently up-converted from 16bit rgb to 32bit rgb etc. What matters in the end is that the final result is equivalent. Now in the case of text, this means the following:

You dump text and insert it back. It should display as the same thing when the game is running. When it does not ― barring edge cases we explicitly forbade, such as <$xx> ― it just means your table file matches two entries to the same text which is really not the same text. A common example would be having two different text engines that happen to share parts of their encoding (or were even built from one encoding!). In such cases, 0x00 might be displayed as "string" in engine A and entry 0x80 might be displayed as "test" in engine B. However, as long as the rendering of the other code fails in one engine, the strings must not be considered the same to begin with, because they are not. In such cases, dumping text per engine might be an option or marking entries with custom tags etc.

Personal Projects / Re: The Console Tool (by Low Lines)
« on: November 24, 2010, 09:18:56 am »
I find it quite unlikely they figured out the format. They probably took them from this topic (or a page with all the GIFs from that topic), or some other source of manually ripped animations.

Indeed loadingNOW figures the format out and had to crack the key they are encrypted with, or so I heard.



Looks like the designation for the color is embedded somewhere within the tile.

Umm, excuse me Captain Obvious, but this is what graphics are all about. Encoding color information... It is this very "designation for the color" which you are altering all along.

I'm editing the titlescreen for Monster Capsule GB and when I was done doing some work on it, I loaded it into Rew and there was some new problem which wasn't there before.

Your description doesn't match what you did at all... I dug out the original title screen (left) and marked some tiles (right):

Notice how the tiles in the region I outlined in red are getting another palette assigned than the rest of the tiles that make up the word モンスターカプセル. This is evident by the different shade of yellow/brown the red arrow is pointing at. In order for you to employ this seemingly free-to-use space, you will have to find out how the palettes get assigned for the title screen and change the outlined tiles' palette association to the one you want. This is nothing you did "wrong" while overwriting the tiles in Tile Molester, just to clarify.



Script Help and Language Discussion / Re: Japanese Translation Questions
« on: November 14, 2010, 08:50:18 pm »
You opted for the wrong なる. There are the following words to be found in a casual dictionary look-up:

  • なる (成る; 為る)
  • なる (生る)
  • 鳴る (なる)

This looks like a case of the first one.



Personal Projects / Re: Romance of Forgotten Kingdom (DOS) Translation
« on: November 03, 2010, 05:48:29 pm »
I'd like to keep the Chinese characters, but have the title below and the company name in English, all in fonts that fit in. Maybe also a translation note next to the company logo. The image has to be edited in PCX format. Please send me a PM if interested.

You should probably provide translations for anybody who feels up to the challenge and cross-post in Screenshots.



Then you got the opposite of me. Works on bgb with accuracy enabled, broken on VBA.

You did not enable accuracy. While there is an accuracy option in the settings, it just emulates timing more accurate. The real options are in the exceptions menu and should all be set to emulate "as in reality". You will then see the real issues.
However, the accuracy settings will not fix the problem with banks 0x22 and onwards jumping to bank 0x00 after vblank.

Well it seems I can't download either the patch or the readme from here, but you can get the file from my website which includes both a readme and a walkthrough:
I just tested the patch from my site with BGB and it works flawlessly.

Well the issues are still there, as it's exactly the same patch. The readme doesn't seem to be included in the archive on RHDN. While we're at it, you should convert both the readme and the walkthrough to UTF-8.



The patch doesn't seem to work. It overwrites all rombank numbers at 0x4000 from rombanks 0x22 on it seems. With that fixed vblank code won't switch back to the wrong rom bank. However, the text is overwriting itself and looking very messed up in the naming screen. You probably do something wrong in your code that the hardware doesn't support or you expect to work differently.

I tested in BGB with the accurate emulation settings on. Here's what I get:

I hope you can fix that so it will be a fully working release :) Also: No readme? What's up with that?



We discussed this in the past. The biggest issue is as you brought up too, getting permission from the authors. I'm sure 9 out of 10 wouldn't mind, but I can already smell the drama the remaining ones would raise. :'(

Exactly. And this fear is not ill-founded. Suppose you were a hacker that mastered a patch. Now some guy comes along and reworks it into a different format. For whatever reason the patch stopped working correctly. Who's getting the blame of putting a corrupted patch out there? Who has to deal with newbies complaining that it just doesn't work? Right, the original author.

Now you say MD5 is gonna save the day. In theory this is all good, however, once some author didn't specify the correct rom to use, or maybe specified it even incorrectly, the guy converting the patch will have to find out and make sure. Of course, it can be argued that the community would only benefit from this, however, it's a hellish job for one single person and sure enough, some small mishap will slip in and then the whole converted patch set can be questioned and you're back to people relying on the old patch format.

Well, I was hoping that since the patch does the same thing, it wouldn't count as a derivative work [...].

Well, it's not so much a derivative work, as it's not protected by copyright of any kind, since it's an open-source patch format and the information contained in it is most likely not the IP of the patch creator ― but may be the IP of the creator of the game to be patched. Also, I edited many readme's and re-uploaded the whole archives to convert them to UTF-8. Nobody made a big deal out of it then, because this was basically a service to the community, so everybody can read the readme's. I doubt it would stir up much feelings at first, yet, when one case is discovered that angers some hacker, you will probably have half the community in your back.

However, say for instance one hacker has something in particular against xdelta and doesn't like his work distributed that way ― much the same as distributing pre-patched roms is frowned upon ― who's right and who's wrong in that case? Surely, I personally think you could do whatever with my patches, as it's out of my control, but I wouldn't exactly like you to convert them from a higher format to a lower format (ergo one without checksums etc).

Also, you haven't quite brought anything new up:

(such as expansion) where a soft ips patch would fail.

IPS patches can handle rom expansion quite well. They can't handle more than 24bit addresses though.

Plus it contains the checksum of itself and the original rom.

So do NINJA-format patches. Why weren't they adopted, although the format has been out for some years now?

Also, one of the many requirements of patches that checksums actually don't address right now is the need for cross-patching for many hacks. For instance, Super Mario Hacks may be patched incrementally, one patch may change the background mechanism, while another changes objects. As long as the offsets for code don't overlap, everything will work fine and actually as expected by the user! Tools that display warning messages because of non-matching checksums will just confuse the user here!

Identical. I think there would be tremendous bandwidth savings, plus reduced headaches for end users.

Well, bandwidth isn't really a problem for most patches hosted by this site, I'd argue. The really big patches for CD-based games etc aren't even hosted here. At least I can remember one distinctive news entry directing to another site because the patch was in the  whopping hundreds of megabytes.

I'd say the community stayed at IPS for such a long time because it meets the requirements of this community. Most games don't need addressing of more than 24bits' width. Also, many people don't want checksum errors to pop up every so often, because it's detrimental to the whole process of cross-patching.
Of course, hacking work for newer platforms can't rely on IPS anymore, but this site doesn't handle that use-case. Also, I believe most CD-based patches already use xdelta, because it was per-sé standard when hacking those console games began.



Front Page News / Re: Site: and Data Crystal Merge!
« on: September 10, 2010, 04:36:00 pm »
Should be fixed and everyone can relax as "Pokémon" now displays correctly again!

Finally! Now I shall find other, more interesting ways to break this :] :D



Personal Projects / Re: The Console Tool (by Low Lines)
« on: September 01, 2010, 08:08:12 am »
DirectX isn't platform-independent, so it probably won't happen.



The new one has italics while the old one doesn't. He didn't overwrite the file, the paths are different while the file names are indeed the same.
I personally feel that the new one is better. Now it just needs proper holes in the As. Just compare the A to R. Usually the hole in R should be only a slight bit bigger if, not like that.



Personal Projects / Re: Code Naturalizer
« on: August 29, 2010, 03:23:23 pm »
Couldn't a tiny bit of devious coding break this entire method?  Like this example (it's GB asm, but applies universally):

Additionally, they do other tricks using jump opcodes which will fool even BGB's disassembly. However, stepping through and using an interactive disassembler as reference on the side, it only slows down hacking, it could never stop it!

Here's an example:
How the disassembler sees it:
 1234: xx xx  jr 1237
 1236: xx xx  (a data byte of an opcode with operands)
 1238: <error invalid opcode>
How the real code is:
 1234: xx xx  jr 1237
 1236: xx     .db $xx
 1237: xx xx  (a valid opcode)

Sorry to disappoint you, but this is neither devious coding or trying to fool disassemblers. Z80gb has varying opcode widths and this will usually work out that code doesn't necessarily start aligned with the preceding code. It's a problem in the disassembly logic of BGB that makes it unable to display code starting from an offset it considers invalid or interpreted differently, even when specifying the offset by hand! As long as the code naturalizer only follows jump and interprets pointers and doesn't try to automatically disassemble all code, you will be fine, because the actual jumps will go to 0x1237.



Personal Projects / Re: Code Naturalizer
« on: August 20, 2010, 04:46:59 pm »
I thought your approach was to make hacking easier. I understood your initial post as if it would be going from ASM to naturalized code and then back in order for the hacker to tinker with the naturalized code.

IMO just naturalizing isn't worth programming. Opcodes are there for a reason: They are concise and exact. Anybody with half a brain will be able to read them just as easily ― if not easier ― than your naturalized code. Simply because a defined and concise syntax is worth its money. Already three lines of your naturalized code will become incomprehensible and unreadable.

So basically, you want to achieve a way to output ASM code in a naturalized way to help people understand it. It's not worth the effort and nobody will be able to properly learn from it anyway. Abstraction is the way to go. Understanding what a piece of code does is not the sum of its parts. Just look at the old threads Rai had going on ― sometimes as many as three for the same piece of code/topic ― and he understood all opcodes, yet couldn't find the abstract routine they implemented or what they accomplished.



Personal Projects / Re: Code Naturalizer
« on: August 20, 2010, 01:30:59 pm »
Bad example, as it doesn't translate back into ASM.



ROM Hacking Discussion / Re: Re: YouTube/Google Video thread
« on: August 12, 2010, 11:26:43 am »
I actually found that very repelling, too.  However, I thought it was just a hacky video.

It's really not that hard altogether to pull off. Basically follow Gemini's advice:

I mean, all you've got to do is an extra tile shifting+DMA transfer to VRAM each time an overflow happens.

I actually have always done it this exact way. The only problem for now is that sometimes text gets rendered to a background that is not the same pattern as the text background, so you see additional "blank" space where the routine will print the next char.

why should an "i" take the same amount of time to appear as an "M"?

Because they're both equally letters. It's the way you read and are least distracted. By getting not-fully-rendered parts every rendered letter, you automatically feel that something is somehow wrong.



Well, it's for in-game scripting routines. Possibly because they wanted to save space by using variable-width commands instead of words. So one command can be 6 bytes long, the next only 3 etc. Therefore pointers won't be aligned and need to be read byte-wise.



at least on GBA/NDS, it makes no sense to do this, as it is an incredible amount of extra work that no one would go through the effort to do, seeing as how there is no benefit whatsoever. pointers should always be word-aligned or maybe half-word aligned (for 2-byte relative pointers) depending on how the code is set up. is that sort of thing commonplace in 6502 or 65816? even so, you can still brute force it by checking for a pointer at every location instead of limiting yourself to being word-aligned.

While this is certainly true for a majority of games, some games do go the extra mile and load pointers byte-wise. Just look at the PokéMon franchise. I also saw it in some other Japanese ROMs, but alas, I forgot the name of that game. However, I think all pointers used by ASM routines will be aligned, since no compiler will split those up, I hope. However, for some pointer in some data, this doesn't hold.



Programming / Re: LZE compression format analyzed
« on: August 09, 2010, 11:29:38 am »
Submit Files.

It's in the left pane. Basically you have to host your text document (TXT, PDF etc.) somewhere freely accessible on the web (hints as to where to host are in the help on that page) and write an entry for it. Then staff will decide whether or not it's worthwhile having on the site, also based on the correctness of the submission form.

So basically some documentation as to how it works beside source code would do in my opinion. Of course you can also stick source code in there.



Pages: [1] 2 3 4 5