News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Adding support for Shift-JIS in hex editor HxD  (Read 2561 times)

nex

  • Jr. Member
  • **
  • Posts: 6
    • View Profile
Adding support for Shift-JIS in hex editor HxD
« on: February 10, 2019, 02:38:26 am »
Hello,

I am the author of HxD, and currently looking into implementing UTF-8 support.

While doing that I also had a look again at supporting Shift-JIS, which I have been asked about by a couple members in the ROM hacking community.

So far however I found no way to represent Shift-JIS text in a hex editor, since there is no way to know where a string starts. And as that information is necessary to distinguish the lead bytes from the trail bytes, this is essentially a deal breaker.

Furthermore, even if you find the start byte, and you can get out of synch again because of bad bytes (such as random binary data separating actual Shift-JIS strings), but you cannot recover either because most lead and trail bytes share the same values (i.e., are not distinguishable).

So even if you would add some kind of correcting offset, to find the right start for a string, this would only be true for a portion of the visible file.

In other words, what text (that the file actually contains) you can see and what not is really pretty unpredictable, and so I wonder if there is even much use for showing Shift-JIS in a text column of a hex editor.

Not to mention that handling text entry is not trivial either. Where does a string start? Where do you start to parse? You can't always do it at the start of the file for performance reasons (and even if you did, you could get out of synch because of bad bytes, or because various strings are aligned differently). And you can't just start at the beginning (or a little before) of the visible hex dump, since that would cause all kinds of oddities while scrolling (since you keep changing the point of reference).

Quite a few hex editors claim to support Shift-JIS, but I wonder how this should be possible (without just basically displaying randomly changing results as you scroll or do other things like selecting etc.).

If anybody has some insights into these issues please comment, or if I am in the wrong sub-forum please move it, or refer me to the right sites/people to ask.
« Last Edit: February 10, 2019, 02:44:30 am by nex »

abw

  • Sr. Member
  • ****
  • Posts: 288
    • View Profile
Re: Adding support for Shift-JIS in hex editor HxD
« Reply #1 on: February 10, 2019, 05:28:29 am »
In the ROM hacking world, generally speaking, it is not safe to assume that any ROM file is composed entirely of text (or even if it is, there's no guarantee that text will be encoded in any standard encoding), so even for fairly identifiable encodings like UTF-8, automatic text detection is pretty much doomed to failure from the start, which appears to be the conclusion you've also reached.

If automatic detection can't work, than my personal opinion is that letting the user select a range of bytes and then say "display these bytes according to text encoding X" is the next best alternative both in general and specifically for ROM hacking. If you wanted to, you could even provide an option for saving the per-file encoding info to disk somewhere, thus making the user's encoding selection persistent between editing sessions.

As a general-purpose hex editor, having an option to interpret an entire file in some particular text encoding seems like a good idea, since there are lots of text files that use only one encoding. If the file being viewed doesn't happen to be in that encoding, then you've got various options for error handling. As long as the user has the ability to easily scroll by one byte at a time, I think I actually prefer the "randomly changing results as you scroll" option; it's efficient in that you only need to work with the visible hex dump rather than the entire file, the user is (for encodings like Shift-JIS or UTF-8, anyway) never more than a couple of bytes away from aligning the visible hex dump with the encoded data, and if there are bad bytes mixed in with correctly encoded text, scrolling by a couple of bytes is all it takes to bring the text after the bad bytes into alignment (at the cost of mis-aligning the text before the bad bytes).

FAST6191

  • Hero Member
  • *****
  • Posts: 2593
    • View Profile
Re: Adding support for Shift-JIS in hex editor HxD
« Reply #2 on: February 10, 2019, 08:13:30 am »
For the sake of others playing along at home
http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml
The same also applies (and would be desirable) for EUCJP so might as well do that too http://www.rikai.com/library/kanjitables/kanji_codes.euc.shtml

Most such editors with claims of support tend to do fixed length stuff, hence the turning to mush (and back again) when an 8 bit (or odd multiple thereof) section appears in the mix, possibly before going back again.

Input wise it only gets worse, especially if you care about ROM hacking stuff as most game implementations of shiftJIS don't do the unicode support that the PC standard technically employs. Roman (and many other) characters are available in the list above, and most games needing Roman characters (and basic hackers looking to do something quick) will do the same.
Even big boy Japanese/Asian word processors like JWPCE/JWPxp https://mijet.eludevisibility.org/JWPxp/JWPxp.html have a separate input button for these to force it into that.

Nobody should be editing anything for real in a hex editor though -- basic analysis, search, find-replace, simple functions (flip, shift...)  and a number of things you can reasonably type as bytes (or copy and paste I guess) being what they are typically held as being for. To that end if you did want to just grab the data from rikai (or even the tables here) and stuff it in as an extra encoding at forced 16 bit lengths then what would do. Anybody needing such a shift will probably then pick a suitable point to delete a byte to shift what they want to see into view as it were. If you wanted to stick a force decode from next 8 bit alignment button similar to the ideas of the word processors mentioned in the line above then that would sort most people out.

If you did want to get more advanced then most things are going to be at least four characters long and only start with 8,9, E or F (mostly 8 and 9 as well) so you could try some kind of probabilistic thing with shifts. While it would be nice I sense this would be something of a departure from the simple but solid thing that has made HxD such a hit over the years -- I can't see how you will do it without some kind of radical retooling of the highlighting command (granted most I have seen try it just have some kind of disconnect between them).

If you care to court ROM hackers for some reason then a simple shiftJIS and EUCJP to the decode options will do you well, basic custom table/encoding support would make you a lot of friends, we have options for relative search but if you wanted..., your one time competitor in mirkes.de tinyhexer* has a nice distribution analysis (though I like hex workshop's more) and a list of shifts, rotations, boolean logic and flips on par with something like hex workshop or 010 will make you endless ones. I forget which one it was but I did also see the option in a hex editor once to flip individual bits by way of a checkbox in the top right of the toolbar.

*many years ago I did a shootout of all the hex editors I could find. hxd, mirkes.de tiny hexer, https://sourceforge.net/projects/hexplorer/ and xvi32 if you counted the scripting combined made something as potent for most of my ROM hacking purposes as hex workshop (and possibly 010 editor). For ROM hacking purposes like the tables mentioned above then we had a few more on top of that.

nex

  • Jr. Member
  • **
  • Posts: 6
    • View Profile
Re: Adding support for Shift-JIS in hex editor HxD
« Reply #3 on: February 10, 2019, 09:13:51 am »
In the ROM hacking world, generally speaking, it is not safe to assume that any ROM file is composed entirely of text (or even if it is, there's no guarantee that text will be encoded in any standard encoding)
That's probably true for most files (besides the standard encodings), that are not pure text files. I mostly ask here because people seem to have used Shift-JIS quite a bit, and wonder what their experiences are, especially with how hex editors behave.

So even for fairly identifiable encodings like UTF-8, automatic text detection is pretty much doomed to failure from the start, which appears to be the conclusion you've also reached.
The user selects what encoding should be used to show/decode text (in the entire file). For UTF-8 it requires quite some effort to do all the things right, but if I didn't make anything wrong in the design (I'll see when implementing) it should be possible to do it in a predictable and reliable way, as opposed to Shift-JIS. This is mostly because UTF-8 is self-synchronizing.


the user is (for encodings like Shift-JIS or UTF-8, anyway) never more than a couple of bytes away from aligning the visible hex dump with the encoded data
Depending on how the person reached that spot, the "random" display (i.e. correcting offset dependent display), might prevent manual scanning of the text column for interesting parts.
At least I do scan through files (for a reasonable distance) manually, to find text in the vicinity (but still often over several "pages") that might give me a first idea for where to look at more closely for interesting parts.

If you did want to get more advanced then most things are going to be at least four characters long and only start with 8,9, E or F (mostly 8 and 9 as well) so you could try some kind of probabilistic thing with shifts. While it would be nice I sense this would be something of a departure from the simple but solid thing that has made HxD such a hit over the years -- I can't see how you will do it without some kind of radical retooling of the highlighting command (granted most I have seen try it just have some kind of disconnect between them).
Such a thing could probably be done as a part of a structure view. Automatic finding of structures (here Shift-JIS strings), could use statistical information about the datatype to find the starting offset as you mentioned.

As a result you'd have overlayed structure views where you can change the offsets, or manually set them.
I also think that this might be technically the best approach: define a starting point in the hex view (manually or automatically), where a string is supposed to start, then have it interpreted according to that.

Currently however I am interested in a generic solution for encodings such as Shift-JIS that have the lead byte = trail byte problem. So that visually scanning the text column with your eyes is somewhat useful and intuitive.


Even big boy Japanese/Asian word processors like JWPCE/JWPxp https://mijet.eludevisibility.org/JWPxp/JWPxp.html have a separate input button for these to force it into that.
I downloaded and executed it, but I never used such a program. Could you point out the relevant steps that show what you mean?
« Last Edit: February 10, 2019, 09:35:18 am by nex »

FAST6191

  • Hero Member
  • *****
  • Posts: 2593
    • View Profile
Re: Adding support for Shift-JIS in hex editor HxD
« Reply #4 on: February 10, 2019, 01:30:53 pm »
Just to make things even more fun I have seen DS games use 16 bit ShiftJIS but with 8 bit markup, end of line/section or character indicators, to say nothing of placeholders. I have also seen it vary within the same file (Zombi Daisuki I think that was, had text files for text rather than anything too custom).

At the top in the main toolbar are four buttons
あ A J 漢

Depending upon which one you click you will get a different input method/output character in the main window when tapping on the keyboard. One of those will be the Japanese encodings of the Roman alphabet; 82 4f through 829A plus some other 82 stuff for punctuation (links because why not http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml or indeed https://www.romhacking.net/documents/179/ should you prefer a simple list, your web browser if you force encoding to shiftJIS should be able to handle it if you don't want to grab one of the table programs here)

Depending upon your font setup it could well be fixed width if you are using the "Japanese ASCII" mode, or something like https://www.romhacking.net/utilities/504/ will also be able to tell you from the final file.

As for visually scanning I will go back to leaving a button to define the starting point as either 0 bits or 8 bits for the purposes of a 16 bit character interpretation. If ROM hackers are your choice of people to cater to then we are all fairly used to pressing page down a lot in a tile editor, only to go back to the top and do it again in one of the other tile/graphical modes the system might afford, possibly before doing each of those again with a different arrangement or starting point.
It would also work in the case of markup and placeholders.

nex

  • Jr. Member
  • **
  • Posts: 6
    • View Profile
Re: Adding support for Shift-JIS in hex editor HxD
« Reply #5 on: February 10, 2019, 01:58:39 pm »
Ok, thanks.

Quote
Even big boy Japanese/Asian word processors like JWPCE/JWPxp https://mijet.eludevisibility.org/JWPxp/JWPxp.html have a separate input button for these to force it into that.
I am still unclear what you mean by "forcing you into that". I type letters in JwpXP, it picks up a certain glyph from a font. Depending on which mode I am in (which button is pressed on the toolbar), the meaning of a keystroke changes (i.e., another glyph gets printed).
But I don't see how that relates to Shift-JIS synchronizing problems or IMEs/autocomplete.

P.S.: Why does every post need to be approved by a mod? Couldn't find any topic saying when this ends.
« Last Edit: February 10, 2019, 02:11:04 pm by nex »

FAST6191

  • Hero Member
  • *****
  • Posts: 2593
    • View Profile
Re: Adding support for Shift-JIS in hex editor HxD
« Reply #6 on: February 10, 2019, 03:20:38 pm »
For most normal purposes shiftJIS supports ASCII text. Most PC programs, web browsers and whatever else will support this all day long. Console games however when they started using shiftJIS tend to omit the low value stuff but they will leave the Roman characters in the upper range discussed earlier. This means anybody seeking to type a bit of Roman character using language into a console game with text encoded in shiftJIS will typically have to use the upper range stuff (or redo the font/character/encoding handling for the game). Most keyboards/computers/whatever will assume when you press A on your keyboard that you mean the character in ASCII represented by 41 hex which is no good for most games I have seen with shiftJIS.

Rather than messing around with keyboard shortcuts or hope some random computer has an IME properly set up then stuff like JWPCE, njstar and such will forcibly reinterpret the normal keyboard entries as the equivalent stuff high up in the range if you put it in the relevant mode. Other modes will be for stuff like phonetically writing Japanese characters or using their Romanised equivalents, and further options there will give you dictionaries of various sorts and things like certain Japanese character sets but you can probably skip those unless you want to learn Japanese.

It has no bearing on the synchronising/the "do I start at 0 or 8 bits to decode this character" aspect of this discussion.

As far as approval I have no idea what goes these days but we did have some spammer issues a while back so probably a result of that. Hopefully an admin sees it and can manually set your user or something.

nex

  • Jr. Member
  • **
  • Posts: 6
    • View Profile
Re: Adding support for Shift-JIS in hex editor HxD
« Reply #7 on: February 10, 2019, 04:18:11 pm »
For most normal purposes shiftJIS supports ASCII text. Most PC programs, web browsers and whatever else will support this all day long. Console games however when they started using shiftJIS tend to omit the low value stuff but they will leave the Roman characters in the upper range discussed earlier. This means anybody seeking to type a bit of Roman character using language into a console game with text encoded in shiftJIS will typically have to use the upper range stuff (or redo the font/character/encoding handling for the game). Most keyboards/computers/whatever will assume when you press A on your keyboard that you mean the character in ASCII represented by 41 hex which is no good for most games I have seen with shiftJIS.
I see. You could also solve that by defining an encoding derived from Shift-JIS that has no ASCII and translates normal letters in Unicode to the "Roman characters in the upper range discussed earlier" in Shift-JIS.
Then there would be no need for special handling, and you could just rely on a (possibly externally defined) encoding. Maybe like Thingy tables or something like that.
You could also use IMEs as provided by Windows, together with autocomplete to select from a set of precomposed forms (since some games apparently expand single bytes to entire strings).

RyanfaeScotland

  • Sr. Member
  • ****
  • Posts: 361
    • View Profile
    • My Brill Game Site
Re: Adding support for Shift-JIS in hex editor HxD
« Reply #8 on: March 13, 2019, 04:02:53 pm »
Won't spam up the thread too much but will take the opportunity to say thanks for the great editor and even more for coming out to solicit opinions on the features you think the community would find useful, that's really cool.  :thumbsup: :thumbsup:

theflyingzamboni

  • Jr. Member
  • **
  • Posts: 91
    • View Profile
Re: Adding support for Shift-JIS in hex editor HxD
« Reply #9 on: March 14, 2019, 09:02:39 pm »
If I can make a sort of related feature request, would you be willing to add support for user-defined encoding tables, like WindHex does? HxD is my go-to hex editor for most things, but I do all my hacking on a game that defines its own encoding. So any time I need to search text, I have to use WindHex, which is a rather clunky old hex editor that feels much worse to use, but has the advantage of letting users load encoding tables defined in text files. I would love, love if you added this feature to HxD so that I didn't need a secondary editor.
ROM wasn't hacked in a day.