Romhacking.net

Romhacking => ROM Hacking Discussion => Topic started by: pannenstazhia on January 27, 2022, 05:10:53 pm

Title: Help getting started with Arabic hacking
Post by: pannenstazhia on January 27, 2022, 05:10:53 pm
Hi everyone,

I hope you're all doing well. I have become interested recently in trying my hand at doing some alternative Arabic translations for several games, but it seems that the WYSWIG tools that exist for other languages aren't as widely available for Arabic. I've tried messaging Pixel Arab and q8fft (in Arabic), two folks who have done SNES translation hacks (this is the platform I'm looking at) for advice to no avail so far. I also tried searching through the forums and I have mostly found auto-threads reporting the translations being finished, but not information about how they were carried out.

I'm aware that there is no doubt some very complex hacking involved in making this work. But if there's any good-will in the community to share resources FOSS style to allow for more people to participate and contribute, I would love to try my hand at it. Does anyone have experience with this or is anyone aware of tools that could be used for specific games (like Final Fantasy VI or Chrono Trigger) to do text insertion with the Arabic font engines already built? I only ask because I have seen a few hacks, especially RPGs, which seem to share similar features/techniques across different creators, which makes me wonder if a sort of 'baseline' of text injection has already been developed.

Thanks and apologies if this isn't the right place for this question.
Title: Re: Help getting started with Arabic hacking
Post by: FAST6191 on January 28, 2022, 12:00:30 pm
Tools wise you are going to be on your own really. Hopefully something is open source enough that you can just edit what is essentially a text file and press go on a compiler (modern Windows/unicode handlers tend to be pretty good about things here).

Most Arabic translations get troubled in two arenas.

1) Right to left text. No game probably outside modern PC will likely support this out of the box and no chance of 8-16 bit era stuff.

2) The whole end of word changing the character thing. I know there is some leeway for computer text in this but I note it anyway.

For 1) then it is kind of a twist on variable width fonts, except you probably want to go in reverse (harder for things that struggle in subtraction) or possibly make some kind of auto shifter that calculates with line length. The "get it done" method usually involves adding enough spaces manually such that it appears as RTL, which takes up considerable amounts of space vs baseline and why sometimes random regions get picked for translation (French tending to be the longest of the common languages games are converted to means that is why many Arabic translations will have used those).
Outside of spaces this is one of the reasons what you see is what you get will not support things here, the other being font support but it is easy enough to overwrite things there I guess and use another table. It also means you immediately get thrown in the deep end -- Japanese might well have English characters and most will be left to right so you get to figure out how to dump the text and go from there, Arabic will have all that plus editing characters plus possibly having to edit the code responsible for handling the text (about as complex as text hacking tends to get before you hit special cases and particular games that are annoying).

There is no baseline method across games, give or take some games with sequels made soon after the first or sometimes games by the same developer. Everything is generally considered a unique case.

So yeah if you want to learn what it would likely involve then go look up what is available for variable width fonts as it is 90% of the same logic (new character, fetch character width if it is not a fixed setup, add to total, write to OAM/BG/equivalent handling part of graphics, if total is greater than [size of screen/text box] then start new line, repeat until end of screen. RTL then being subtract from a total, place accordingly, subtract until less than or equal to..., start new line with maybe the perk of not having to do a width lookup). The shifter approach then will be harder with character by character decoding but you would have it read LTR as it is inclined to do/the hardware is inclined to handle and calculate line length, adding enough manually to shift it across.
Title: Re: Help getting started with Arabic hacking
Post by: RodMerida on January 29, 2022, 03:16:48 am
I'd like to do the same than you for Persian, pannenstazhia (and later eastern Kurdish) with the help of native ones, that I know, of course.

Once I translated my own game, that I programmed, to Arabic, Persian and Kurdish (written with Arabic script), and what I did was inventing a notation system in Latin alphabet for indicating each form of each letter. Letters in Arabic script may vary their form depending or wether they are or not after certain letters or at the end of a word, or isolated.

So for example, كتاب (kitāb, book) was written ktaB, but ابو (abū, father) was written abw. اخي (akhī, my brother) would be axî, and حليب (halīb, milk) would be HliB. Or in Persian, آب (āb, water) was written âB, and مى باره (mibāre, it rains) was written mybare, and حرف مى زنه (harf mizane, speaks) would be written HrF myznE, پنير (panir, cheese) would be pnir , with i and not y, but همين (hamin, "this" in Persian) and هذه (hādhihi, "this" feminine in Arabic) would be hmiN and hðe , because when hā ه is represented in isolated position I transliterated like e, not h, since Arabic alphabet doesn't have the vowel "e" (it's either represented with the diacritic kasra بِ or with other letters, like ه hā at the end of word for Persian, and sometimes alif ا in Arabic, depending on the regional dialect pronounciation too).

So this is a method of transliteration from Latin alphabet to Arabic. I think it's very exact regarding a computer will have to convert it.

Thus you redraw the whole alphabet into the game, you make your own TBL like I have said and you start editing by typing the Arabic in Latin alphabet. You could make some mistakes until you get used but eventually you'll be used.
Title: Re: Help getting started with Arabic hacking
Post by: Anime_World on January 29, 2022, 07:12:06 am
Just decrement tilemap write address on write loop routine instead increment address.
(https://cdn.imgpaste.net/2022/01/29/KyDUt3.md.png) (https://www.imgpaste.net/image/KyDUt3)
Title: Re: Help getting started with Arabic hacking
Post by: RodMerida on January 30, 2022, 07:37:01 am
It's not very understandable, man. Very hard to read.

And how do you edit the text? In what alphabet you type it, and with what kind of editor?
Title: Re: Help getting started with Arabic hacking
Post by: Hemlock on January 30, 2022, 10:45:16 am
A while back, a fan translation of Chrono Trigger into Arabic was released. If you can find the team that did it, asking them would be worth it.
Title: Re: Help getting started with Arabic hacking
Post by: VicVergil on January 30, 2022, 12:34:38 pm
pannenstazhia probably already got in touch with us (https://www.etrdream.com/) eventually (that, and i was involved with q8fft on the FFVI translation among others), but in the interest of anyone asking the same question about the general process. It's strictly applicable to Arabic and Persian, alone. Other RTL languages have other complications.

- Arabic is written right-to-left, so that means one of two solutions: ASM programming to change the code, or no programming changes. Mixed-language text is rarely used, but in the cases it is, it needs so-called BiDi support (bi-directionality) but I don't think you can afford doing that on a retro game.
- Arabic uses so called reshaping. The logical forms (Unicode Arabic) call different glyphs for the same letter depending on context within a word. The contextual glyphs have separate codepoints in Unicode (Unicode Arabic Presentation Forms A/B) that can be called directly, but it's inadequate for other Arabic-script languages. Similarly, you can either have the game handle reshaping for you (ASM changes) or you preprocess the text instead.

Translation process goes similarly to any other Latin script language, with all contextual glyphs included in the font (may require ASM changes to expand the font, like with Skyblazer) and any cosmetic changes (reworking menu layouts and cursor positions to be RTL, generally tilemap/sprite data changes) that might be required. There was one time I added direct Unicode codepoint support for a SNES game for the hell of it, but it makes no difference to the end user, so we generally use "replacement characters" (whatever the game's table supports, we may cram Unicode Presetation Forms B characters directly in the tbl file if the text inserter supports UTF-8 characters, or Latin/CJK characters supposed to replace Arabic, etc) and change the graphical font data to match those replacements (so it might be "A" in the text, but the graphics for "A" in the font is "ء" so you can piece together full legible sentences that way.)

Arabic translation tooling was disastrous until recently. Even some gatekeeping and presenting buggy nametable editors as valid text editors while hoarding information about better tools that accept the copy/paste feature. The overrepresentation of 8-bit software shouldn't be seen necessarily as the natural state of things in the case of Arabic. Persian fan translation output of PS2 software and more recent should be more than enough proof of that.
There was this valiant effort (https://www.romhacking.net/utilities/1012/) (disclosure: was involved in testing and feature suggestions) but the project had its limitations and was discontinued. There were other tools but most weren't fit for purpose (worked well with modern games that supported Arabic directly but with some lacking implementation). We (Asgore for the most part) redid our tools ahead of the release of our Zelda Breath of the Wild translation and made it more batch conversion friendly, though while powerful for our needs, it's still not general user friendly by any means. We still rely on regular expressions occasionally.

A preliminary version of the tool can be found here (https://github.com/asgore-undertale/Asgore-Studio) but its python version is getting discontinued and reworked. The executable in the build is behind the latest changes (implementing Persian for a friend who's working on a certain unannounced project, some bugs specific to Persian happened. That's also when we realized other languages will be unsupported in our current implementation unless it's massively reworked, but it's alright either way. Right now it supports all languages included in the Presentation Forms A codepage.)

The most important part is the config menu here, that specifies if the tool needs to preprocess the text for RTL, BiDi, or reshaping. The options below are for games with absolutely no ASM changes (simplest case), and an arbitrarily defined character replacement table.

(https://cdn.discordapp.com/attachments/898818398669127730/935782204993335296/unknown.png)

That character replacement table can be defined here (.act file extension), then it's saved as a new file and loaded in the previous window.

(https://cdn.discordapp.com/attachments/898818398669127730/935781681292525588/unknown.png)

There are other options like escape codes for direct hexadecimal values, or various ways of handling diacritics (depending on whether the game's rendering engine allows it). I think the most important one that might cause problems is the one for line breaks, but the defaults should work.

The complete game script can be copied here and the tool will convert it just fine.

(https://cdn.discordapp.com/attachments/898818398669127730/935782402297565204/unknown.png)

It's not very understandable, man. Very hard to read.

And how do you edit the text? In what alphabet you type it, and with what kind of editor?
And... I can't say I'm not pleasantly surprised to see RodMerida's interest in Arabic. I was interested in working on Dragon Quest 3R for a while but working on a buggy English version as a base wasn't a very attractive prospect, and was really thankful for his recent bugfix patch. I'm glad I could help you in some way.
If you have any questions in general that might be of help to you please don't hesitate to ask.
Title: Re: Help getting started with Arabic hacking
Post by: RodMerida on January 31, 2022, 02:02:09 am
Quote from: VicVergil
And... I can't say I'm not pleasantly surprised to see RodMerida's interest in Arabic.

I am graduate in Arabic philology, man. And I'm living in Iran right now (my wife is Irani). I'm learning and practicing Persian and Kurdish everyday (everybody speaks that, Kurdish, among each others all the time in here; added to this, my mother-in-law only speaks to me in Kurdish, lol, so I have to use my basis of Persian for noticing the differeces and understanding however I can; most of times I answer in Persian, though, that everybody knows and understands, and most speak). I like very much the three languages. And Arabic is a very mathematical language, and is very old, too, it has not changed much in centuries in its classic or standard form and you may read texts from either nowadays or centuries ago until VI or so (for example qasidas, that is classic poetry) and from very vast regions, even from Spain (al-Andalus). It's very interesting for me. Since it's a language that still conserves the difference among long and short vowels in its spoken form, its poetry sort of resembles to me very much to classic Latin and Greek poetry, based in rhythmic patterns.
So yes, you could say I'm interested in Arabic, and in Arabic script.

I see many retro games start to be translated few by few to Arabic, at least the most important or famous ones, like Final Fantasy VI and Chrono Trigger. By that part I'm not so worried; because I see a bunch of audacious native speakers making the job they'd be intended to, with much difficulty. Besides semitic languages are harder. But I feel pity whenever I see there is almost no retro game translated to Persian. And let's not say to Kurdish with arabic script (if there were some it would be in Latin alphabet, made by speakers of the Eastern variety of Kurdish, Kurmanÿi, that is very different from the one spoken here). So I'm finding some competent native speakers to try to port some games. I found one for Persian, who is passing me a whole script from Spanish to his own language, in a Microsoft Word document. In case he ever finishes, the problem will come when typing that into the game.

Yes, you could be of great help to me.
You also have all my blessings for translating Dragon Quest 3 to Arabic. I encourage you. And if I may help you in any aspect related with ROMhacking of that game (for example, menues), here you have me for whatever!
إلا لقاء، يا صديقي