logo
 drop

Main

Community

Submissions

Help

71509768 visitors

Author Topic: Table File Standard Discussion  (Read 18553 times)

rveach

  • Newbie
  • *
  • Posts: 22
    • View Profile
Re: Table File Standard Discussion
« Reply #60 on: March 13, 2012, 04:15:49 pm »
How is this different from what some of these guys do?

The same reason, you made (2.5) Linked Entries and (2.6) Table Switching.
You could do those things with a (2.2) normal entry (excluding infinite loops), but it requires ALOT of writing depending on how complicated things get. So it is to save time for writing out each entry by hand, user errors that could possible result from manual typing, and size of the table file.

For a game that uses SJIS, to have a complete table of that SJIS, it requires ~67kbs and ~6800 lines (I will show my file if you require to see it), which won't include the game's custom codes they use. I only used ASCII in my examples above, for simplicity.

So allowing Character Maps would reduce the file size and increase a faster understanding of the table file (if your reading someone's elses), while memory usage would probably only increase slightly. The only extra memory would be from making sure you don't overwrite anything you defined in the table specifically. The character map entries will be in memory regardless if they are manually entered or generated, so they don't add any extra memory.

Is defining creation of the map part of defining the map?

I'm not sure what you are saying here.
If you mean the 'user custom map', then it is defining the ordering and letters in the map on that one line.
If you mean the normal maps, then maybe look at my code below to see when the map is defined in memory.

How do you propose something like this be implemented code wise?

Well, there must be some way to generate a list of printable characters in a specific character map, programmatically, but I haven't found a way with google yet. If its not possible, then its up to the implementations to build into their programs the most popular character maps, or support multiple DLLs that will contain them, thus allowing users to add their own maps and add unpopular ones.

This is my first crack at how to implement the character maps (which may look far from perfect lol), but there may also be a better way:
Code: [Select]
init list storage
init removal list

main loop:
    read line

    if (line is a character map support)
        parse starting byte
        parse length of character map //if it is custom, otherwise we will have the length builtin
        save rest of line and parses into list storage

        continue
    end if

    **process line like normal**

    if (line identifies hex code)
        if (hex code is in the range of one of list storages) // requires looking through all of list storage
            add hex code to removal list
        end if
    end if
end main loop

foreach list storage
    add new printable characters from maps // like it was in the format: hexadecimal sequence=text sequence
        in list storage minus what is in removal list
end foreach
« Last Edit: March 13, 2012, 04:37:04 pm by rveach »

henke37

  • Sr. Member
  • ****
  • Posts: 337
  • Location: Sweden
    • View Profile
Re: Table File Standard Discussion
« Reply #61 on: March 14, 2012, 06:27:59 pm »
When in doubt, just include the standard mapping as a second table file that has to be bundled with the table reader.

Nightcrawler

  • Hero Member
  • *****
  • Posts: 5734
    • View Profile
    • Nightcrawler's Translation Corporation
Re: Table File Standard Discussion
« Reply #62 on: March 19, 2012, 11:19:17 am »
I'm afraid this standard may meet a tragic and embarrassing end.  :-[

Apparently, Feb 6th, the draft was somehow overwritten with an incorrect version on transcorp (probably from an FTP program hasty click fest on my part). It does not include the majority of my changes from the 10 June draft despite being marked as such. My current local copies were later subsequently refreshed to match. So, now I don't actually have any copies with all of (for some reason some are there) the changes mentioned in this post.. I have absolutely no desire to re-do all that work.

Does anyone else happen to have a copy of this from between June 2011 and Feb 6th, 2012?


EDIT: I found a copy from August 2011 from an old site backup and a copy on a USB Stick from June 23rd 2011. They still doesn't include the changes listed from June. It seems it may have been mis-uploaded right from the start in June. :'(
« Last Edit: March 19, 2012, 05:30:00 pm by Nightcrawler »
TransCorp - Over 15 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Herakles IV SFC/SNES Translations

henke37

  • Sr. Member
  • ****
  • Posts: 337
  • Location: Sweden
    • View Profile
Re: Table File Standard Discussion
« Reply #63 on: March 21, 2012, 01:19:56 pm »
There is always the option of trying to write it again. You could even include new features.

Nightcrawler

  • Hero Member
  • *****
  • Posts: 5734
    • View Profile
    • Nightcrawler's Translation Corporation
Re: Table File Standard Discussion
« Reply #64 on: March 28, 2012, 02:11:00 pm »
Having nothing was unacceptable. Nine days of hell gave birth to a draft that is ready for final review and edit.

The new draft contains all features previously discussed and rules accounting for nearly all edge cases presented in this topic!

List of Changes

The only things that did NOT make it in:
  • Variable Parameter Lists
  • Control parameters accessing external tables
  • Character Map Support

These items either came too late, details were not fleshed out, or added complication was undesirable. In light of taking 2 years to flesh out what we have, the standard remains frozen and no new features will be considered at this time. Sorry.

Notes:

1. [] for Non-Normal Entries
[] were chosen for this as they were the popular choice in this topic. However, consideration should be made for <>,{},«», or other pairing of characters (remember they are disallowed in normal entries). This is easily changeable if anybody has an opinion one way or the other. For some reason, I am drawn to {} more than [].

2. Token Insertion Mutation (from hex literal or in careful insertion)
The only edge case/s not fully addressed is token mutation. This is an issue specific to what happens to a stream of binary output after attempted insertion. I don't believe this can be fully addressed in the table standard. The standard can only reach as far as to set some rules or guidelines to eliminate these situations from occurring. It has done so for all possible cases it could that we properly addressed (raw hex and others). Because token mutation is possible even within a single valid table (example at bottom of this post) it doesn't seem possible to address all possible instances. If you have suggested passages to add to the standard that might better address this, I'd happy to consider inclusion, otherwise, I believe it to be addressed as best as it can be within the reach of the standard.
TransCorp - Over 15 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Herakles IV SFC/SNES Translations

FAST6191

  • Hero Member
  • *****
  • Posts: 606
    • View Profile
Re: Table File Standard Discussion
« Reply #65 on: May 02, 2012, 10:02:48 pm »
I did not miss it but forgot to post something

Still I read it several times so thanks for that as it refreshed me on a few concepts I had more or less dismissed in recent years (Kuten/Handakuten and Yoon/Youyon or diacritics in general are usually kicked to extra font entries and although I am no stranger to multiple encodings per game/file/script switching out for something as simple as the kana is not something I tend to see any more and I actually can not recall the last time I saw 3.3 Example 3 although I guess I also can not remember the last time I saw true word level dictionary compression either save for some curious situations in LZ) and pondered how I might apply it to some certainly not common situations but situations I have encountered none the less. They will mostly be from the DS, wii and maybe parts of the 360 which I guess is my main source of hacking work and probably why I had not seen the things above in a while (memory management is still a thing but not half as aggressive as it had to be for the 8 and 16 bit era). For the most part consider this a +1 with some pointless waffle from me, I have no real problems or need for solutions to the things below and would be quite happy to see this implemented beyond wondering if it is worth having a right to left reading flag (we can probably ignore tategaki though and kick it to a control code and fairly safely assume boustrophedon (alternating directions) is a thing of the very distant past) as Arabic, Hebrew and similar languages are getting a few translations nowadays.

I have reservations about some of the insertion side of things but that is always going to be the case and I agree this is probably a good way to serve the most people at once- some games support several encoding types and are quite fussy about replacing like with like even in the same line (Zombie Daisuki for the DS used parsed plain text files with some line numberings but switched between ASCII and shiftJIS (often without ASCII or U16 fallback options- shiftJIS implementations on the DS more often than not miss out that part of the spec and although it is better nowadays earlier on it was a cause for a minor celebration to have the game include the Roman characters as part of the actual shiftJIS section) meaning the collisions workarounds might not be ideal. Also there might be an issue with some things having multiple end tokens (usually for line end (line breaks were different), section end, a part end and maybe a file end but that can probably be worked around as well.

Starting with the crazy silly thing.
Riz Zoward/Wizard of Oz beyond the yellow brick road. Helping out with an English to Spanish translation (US version of the rom as the base).
Some text in it was fixed length, others was standard pointer text and the third most interesting part was damn near scripting engine grade.
In short each entry was a type code a few bytes long, a length value for the type, length and if any the resulting payload and a payload itself if there was any (think cheat encoding methods). The payload consisted of text and control characters of various forms as well as calls to various 2d and 3d image files and presumably a bit more than that (I did not take reverse engineering the format that far after I figured out roughly what went). The type codes and lengths probably could have amounted to fixed length entries which might help things but I think I mainly mentioned this to be annoying.

Also possibly related http://blog.delroth.net/2011/06/reverse-engineering-a-wii-game-script-interpreter-part-1/

Related to the above back in the standard prior to the characters being named there were other names being used with the option of it being replaced from a name entry screen (not sure how the game eventually did it) where traditionally we might have seen placeholders. I might be able to abuse control codes but a "read only" flag of some form might be an idea although thinking further that might just my being lazy and half of this project seems to be about avoiding the extra cruft so ignore that.

More recently I was pulling apart Atelier Lina ~The Alchemist of Strahl although I have seen it before in other files/formats.
Text was shiftJIS but each section was bounded by a hex number starting from 0000 and going up from there. Short sample
Code: [Select]
?????

錬金術の極意

踏ん張り

フラムっぽい物

みんな頑張れっ!

テラフラムっぽい物

どんぐりメテオ

It does not take too long to hit actual characters and perhaps more troubling control codes which in this case were not 8 bit but I have seen several with hard 16 bit characters and 8 bit control codes (or similarly non multiple of 16 bit placeholders) that troubled more basic parsers but that is probably not where this is heading.

Games using XML esque structures. I certainly see XML in games far more but that is usually something that escaped the cleanup process before building or for unrelated files. Probably can be ignored as they tend to be using known encodings and are not really the domain of table files.

Games using square brackets as escape control code/placeholder value indicators. A simple workaround I guess but one I might have to think about (for no other reason I would sub in the corner* or curly brackets).

Use of the yen symbol*, I guess I could always do alt 0165 (must remember the 0) or otherwise define/bind something or if I am truly lazy copy and paste but it does not tend to appear on a European keyboard and I am lazy.

*it is not lost on me.

There was probably more I meant to so say but is appears to be nearing 3am (again).