Romhacking.net

Romhacking => Personal Projects => Topic started by: baka-neko on January 06, 2018, 01:08:06 pm

Title: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 06, 2018, 01:08:06 pm
(http://neko.works/upload/555.png)

Hello,

I've made a free software that detects any Japanese text in a running game and copies it to the clipboard. It is then possible to use a 3rd party software such as Translation Aggregator to display the furiganas (Kanji readings) or even a translation!

The requirement is to provide a profile for each game including a pixel perfect matching font.

The font should look like this:

(http://neko.works/upload/556.png)

(http://neko.works/upload/557.png)

Please give it a try making profiles / fonts for any game / platform that you like. I would love to have early feedback and suggestions. Thanks!

Download and more information on the official website:
neko.works/FBCR (http://neko.works/FBCR)

Here's a few usage example:

(http://neko.works/upload/561.png)

(http://neko.works/upload/562.png)
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: Squall_FF8 on January 07, 2018, 05:29:42 am
Hi baka-neko,

I think you have a great idea for fast translation of games, that don't require patching the game. I can think of number of applications of that idea!

There is one tiny detail that I didn't get:
Quote
I've made a free software that detects any Japanese text in a running game and copies it to the clipboard.
What triggers the translation?
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: chillyfeez on January 07, 2018, 09:24:23 am
A couple of questions:
Does the program, um, know what it's reading, or is it only picking up instances of the characters the user feeds into it? What I mean is, once I give it this profile of potentially hundreds+ kana and kanji, does it match them with its internal alphabet, or do I have to go through some sort of mapping process?

Is it possible to implement a function that allows the user to feed a sprite sheet (so to speak) of Japanese characters instead of individual images of each character? I don't know much about later consoles, but I know that with a lot of 8- and 16-bit games I can load the ROM into a graphics editor and instantly make a BMP copy of the entire alphabet. Copying individual characters, though, would take that same amount of time, multiplied by the number of characters used in the game, meaning hours of prep time for each new game.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 07, 2018, 09:45:17 am
There is one tiny detail that I didn't get: What triggers the translation?

Hello,

For the translation, you have 2 options:

- Real-time machine translation using Translation Aggregator, or any translation software that read the clipboard.

- Soft translation, basically, when FBCR detects the Japanese text, it will write it in the dumps sub-folder in the file id_of_the_game.txt, with each unique sentence on a separate line. If you translate each line and save it to a file named id_of_the_game.EN.txt, and activate the translation in the profile by adding the command translation:EN, when playing the game, instead of copying the Japanese text, FBCR will copy the matching translation to the clipboard. You'll just need a software that display English clipboard, such as "clipbrd" that comes with Windows.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 07, 2018, 09:54:07 am
Does the program, um, know what it's reading, or is it only picking up instances of the characters the user feeds into it? What I mean is, once I give it this profile of potentially hundreds+ kana and kanji, does it match them with its internal alphabet, or do I have to go through some sort of mapping process?

The detected Japanese text is converted to a standard UTF8 sentence that is copied to the clipboard.

Is it possible to implement a function that allows the user to feed a sprite sheet (so to speak) of Japanese characters instead of individual images of each character? I don't know much about later consoles, but I know that with a lot of 8- and 16-bit games I can load the ROM into a graphics editor and instantly make a BMP copy of the entire alphabet. Copying individual characters, though, would take that same amount of time, multiplied by the number of characters used in the game, meaning hours of prep time for each new game.

Not possible currently, but you could write a Python script or similar that will separate the characters from the sprite sheet into a format that is compatible with FBCR.

The font format in FBCR is a list of .bmp images, one for each character, named 0001+, plus an UTF8 corpus.txt file containing all the corresponding Japanese characters in one line, ordered depending on the order of the images.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: chillyfeez on January 07, 2018, 10:51:57 am
Quote
The detected Japanese text is converted to a standard UTF8 sentence that is copied to the clipboard.
Hmm... Either you didn't understand my question, or I don't understand the answer. Both of my questions relate to the amount if pre-play prep the user would need to undergo before being able to play the game and have text copied to the clipboard.
What I was trying to ask is, let's say I only provide FBCR with one character: 人. Does FBCR automatically recognize that as "jin," or do I have to tell the program, "this is 'jin:'人?"
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 07, 2018, 11:10:08 am
What I was trying to ask is, let's say I only provide FBCR with one character: 人. Does FBCR automatically recognize that as "jin," or do I have to tell the program, "this is 'jin:'人?"

The font is defined by 2 parts: The .bmp images, in this case an image of 人 named 0001.bmp, and an UTF8 corpus.txt file with all the corresponding Japanese characters on the same line, in this case it will contain just one character: 人.

You can see an example for Surging Aura here:

(http://neko.works/upload/556.png)

(http://neko.works/upload/557.png)

If you're talking about the readings and dictionary definitions etc. All of these are handled by the 3rd party softwares, such as Translation Aggregator.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: Squall_FF8 on January 07, 2018, 11:20:22 am
- Soft translation, basically, when FBCR detects the Japanese text ...
Sorry I understood the translation from the first post. My question is more like:
What triggers FBCR detection of Japanese text?
 
Lets say I have prepared all the prerequisites. I start the game. I start all the tools (FBCR, translators,...). While playing on the screen Japanese text pops up. How FCBR know/understand that there is something that need OCR. Do I have to press a keyboard button, do I have to click something ...?
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 07, 2018, 11:29:45 am
Lets say I have prepared all the prerequisites. I start the game. I start all the tools (FBCR, translators,...). While playing on the screen Japanese text pops up. How FBCR know/understand that there is something that need OCR. Do I have to press a keyboard button, do I have to click something ...?

FBCR will parse the image from the game every 250 milliseconds and see if there are changes, if there are, it will wait until 3 frames are identical - to wait if the text is still building. After that it will trigger the text detection which is 100% visual - but is not OCR. You can change both of these parameters in the profile. So basically you don't need to do anything.

Though you can force the text detection anytime by pressing F10 - you can also change it to another function key in the profile.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: Squall_FF8 on January 07, 2018, 06:45:56 pm
After that it will trigger the text detection which is 100% visual - but is not OCR

"Optical character recognition (also optical character reader, OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example from a television broadcast)" WiKi (https://en.wikipedia.org/wiki/Optical_character_recognition)
In short: an extraction of text from image is OCR :D

Anyway I think you idea is excellent and will have its application. Since I suspect that you do pixels matching its important to use filters that produce the same pixels, regardless what is around, that is why things like Bilinear interpolation should be off.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 08, 2018, 01:52:22 pm
In short: an extraction of text from image is OCR :D

Might be ;) What I wanted to say is that I'm not using any OCR SDKs or algorithms, as I'm just doing pixel perfect comparisons, with just a few tricks on colors.

Anyway I think you idea is excellent and will have its application. Since I suspect that you do pixels matching its important to use filters that produce the same pixels, regardless what is around, that is why things like Bilinear interpolation should be off.

Thanks! Yes, the game needs to be displayed clean. For instance using RetroArch, you need to disable Bilinear Filtering, shaders, and set Aspect Ratio to 1:1 PAR (to have square pixels). You can use scaling but it will need to be Integer Scales.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: Vanya on January 11, 2018, 04:49:16 pm
Pretty cool. I imagine something like this could be adapted to create an overlay in a custom emulator, not unlike the things that are possible in FCEUX with LUA scripting.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 12, 2018, 06:01:39 am
Pretty cool. I imagine something like this could be adapted to create an overlay in a custom emulator, not unlike the things that are possible in FCEUX with LUA scripting.

Thanks! As it works with the clipboard, it should be easy enough for any emulators to display the text in-game!
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: danke on January 12, 2018, 08:14:58 pm
This doesn't seem that practical for games with hundreds of kanji. Especially if they aren't in any kind of standard order.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: Vanya on January 13, 2018, 03:09:22 am
This doesn't seem that practical for games with hundreds of kanji. Especially if they aren't in any kind of standard order.

True, but for many games that have a lot of technical difficulty in creating a translation this is definitely way better than nothing at all.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 13, 2018, 11:36:24 am
This doesn't seem that practical for games with hundreds of kanji. Especially if they aren't in any kind of standard order.

If the font is standard - can be generated from a true type font - such as in most PC games, you can build the font using the included Font Builder in just a few minutes.

If it is not, such as in most console games, the good news is that once you have completed one font, you could probably use it with other games from the same dev / or on the same platform!
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: elpan on January 21, 2018, 06:48:57 pm
Hi first of this is fantastic baka-neko. Can see use for some chinese korean games like Wind Fantasy XX. I presume all types of fonts are possible.

I was trying to see what I can do with pc98 version "Kuro No Ken".  My understanding is in the profile, width and height are for the text box area your trying to match with selected font. Then Left and top are how far the box start from the game window. Tried to mess around with that best i could get is .... or --- characters.
I then decided to check some example you provided. Fired up Surging Aura on RetroAct and followed your instructions and got the same thing. I tried to set resolution to native megadrive 320x224 and still got nothing.

Looking at the log I see things like this Valid Rect: LEFT:465 TOP:83 RIGHT : 31 BOTTOM : 0. Wondering what it all means and if it helps. Or perhaps I lack understanding of it all. Thanks.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: Jorpho on January 21, 2018, 08:52:24 pm
If the game is using standard Windows text rendering, shouldn't it be possible to hook into the process directly and determine exactly what characters the game is using?  I think the NJStar CJK Viewer used to do that.  (Anyone else remember that one?)

Of course, if the game is running in an emulator or somehow using its own text renderer, that would be a different matter.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 22, 2018, 11:08:11 am
Hi first of this is fantastic baka-neko. Can see use for some chinese korean games like Wind Fantasy XX. I presume all types of fonts are possible.

Thanks! Yes, it should be possible to use it for any language, as long as you provide the font.

I was trying to see what I can do with pc98 version "Kuro No Ken".  My understanding is in the profile, width and height are for the text box area your trying to match with selected font. Then Left and top are how far the box start from the game window. Tried to mess around with that best i could get is .... or --- characters.

The first thing you need to check is the game's window name, for instance if you're using Neko Project, the command should look like this:
window_name_starts_with:Neko Project

Then check the scale of the window, if it is native (640x400), the command looks like:
scale:1

width and height are the size of the zone where the text is. left and top is the position within the client area. To choose precise values, press F9, this will generate 3 images in the same folder as FBCR.exe, check the file that ends with _full.bmp

Sometime you need to adjust the top value, as the image grabbed from the game change size, use the F9 trick to find the correct value.

Anyway, I've checked the game, it seems it is using the NEC PC98 font, so I've made a quick profile for the NPC dialogues, you would probably need an other profile for the intro and cut-scenes. Here's a shot:

(http://neko.works/upload/564.png)

And here's the profile:
https://www.dropbox.com/s/ax8cafxjy6aqbzz/Kuro%20no%20Ken%20%28Forest%2C%20PC98%29%20%5Btimer%2C%20bottom%2C%20white%5D.txt?dl=1 (https://www.dropbox.com/s/ax8cafxjy6aqbzz/Kuro%20no%20Ken%20%28Forest%2C%20PC98%29%20%5Btimer%2C%20bottom%2C%20white%5D.txt?dl=1)

I then decided to check some example you provided. Fired up Surging Aura on RetroAct and followed your instructions and got the same thing. I tried to set resolution to native megadrive 320x224 and still got nothing.

Same here, check the window name, the scale should be the same as the profile, disable all the shaders, and do the F9 trick to check the top value.

Looking at the log I see things like this Valid Rect: LEFT:465 TOP:83 RIGHT : 31 BOTTOM : 0. Wondering what it all means and if it helps. Or perhaps I lack understanding of it all. Thanks.

This is just for debugging, the valid rect is the part of the rect that you have defined in the profile that includes at least 1 pixel with the text color.

If the game is using standard Windows text rendering, shouldn't it be possible to hook into the process directly and determine exactly what characters the game is using?  I think the NJStar CJK Viewer used to do that.  (Anyone else remember that one?)

Of course, if the game is running in an emulator or somehow using its own text renderer, that would be a different matter.

Yes, standard text hooking is possible with many Windows games, but not all, and it is not possible at all with console games.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: elpan on January 22, 2018, 06:06:50 pm
The first thing you need to check is the game's window name, for instance if you're using Neko Project, the command should look like this:
window_name_starts_with:Neko Project

Then check the scale of the window, if it is native (640x400), the command looks like:
scale:1

width and height are the size of the zone where the text is. left and top is the position within the client area. To choose precise values, press F9, this will generate 3 images in the same folder as FBCR.exe, check the file that ends with _full.bmp

Sometime you need to adjust the top value, as the image grabbed from the game change size, use the F9 trick to find the correct value.

Thank you, I understand much better now and was able to test a few profiles. Kuro no Ken is working thanks for the profile.
Some feedback. If possible would be great to be able to load multiple profiles at the same time. or better yet a way to define multiple text zones in the same profile for different areas that text appears. Also the speed to parse can seem slow sometimes.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 23, 2018, 10:30:44 am
If possible would be great to be able to load multiple profiles at the same time. or better yet a way to define multiple text zones in the same profile for different areas that text appears.

The current solution is to make multiple profiles for the same game for different parts of the screen, and switch profiles when needed using the icon in the notification area.

I will probably introduce shortcuts for fast switching.

Also the speed to parse can seem slow sometimes.

This is due to the nature of the detection. Though it is possible to speed it up by reducing the size of the text zone, or reducing the characters count in the font.

For instance, the Surging Aura profile is super fast due to having only around 100 characters. And the low resolution of the text zone.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: Squall_FF8 on January 23, 2018, 11:38:54 am
Hey baka-neko, what is your connection with 'Light Fairytale'?
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 23, 2018, 11:47:27 am
Hey baka-neko, what is your connection with 'Light Fairytale'?

It's one of my work-in-progress games.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: Squall_FF8 on January 23, 2018, 02:56:36 pm
Whoah ...  Today I stumbled upon a video of the game on Steam and immediately added it in my 'wishlist'! While watching a familiar name just popped up, so I had to ask :laugh:

Do you know more precisely when the release date will be? Also do you have video of a battle when other then 'Attack' is used?

I have to admit I love the graphics, the music, the settings! Its pretty much what contemporary FF should look like. Unfortunately after FF10, all is crap. I had high hopes for "I'm Setsuna" to be the long anticipated real FF11, but other then the graphics (which is fantastic) didn't do - music is just one instrument (piano), the battles lack the depths that FF titles had ...

Anyway I wish you good luck with finishing the game!
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 23, 2018, 03:18:38 pm
Thanks for the support!

The release was planned for the end of March, but the development is taking a bit more time than expected, so it will probably go beyond that unfortunately.

My current priority is to implement all the locations and events, as this is the most difficult part of the development. The battle system is currently quite basic, but I will re-start working on it soon, along with other gameplay mechanics such as the mini-games.

I did purchase Setsuna on Steam but hadn't the time (nor the motivation) to play it. I played a few hours of Worlds of Final Fantasy on PS4, and think that it looked gorgeous, but the story wasn't that great, and the chibi characters were ... well, too chibi :D
Though the non-chibi characters looked beautiful, and the battle system is not bad. I wish they made a remake of FF6 (or even FF7 :laugh:) like this!
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: Psyklax on January 23, 2018, 06:44:21 pm
Hey, this looks like it would be excellent for helping to translate PC-98 games, which I'm quite interested in doing, as well as PC-88 games perhaps. But I don't understand the instructions on how to get the PC-98 font: the instructions in Japanese seem to suggest you need to run a program on a real PC-98.

Also, I don't really understand the instructions on your site either. :D I've downloaded the program plus Translation Aggregator, but I can't understand what to do next. It seems like I need to provide lots of details in a profile for each game such as font names and such. There isn't much explanation on how to do this.

I'd love to use this to translate games that use clear fonts like PC-98 games, but I'm afraid I'll need more info on how to even get started with it. :)
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on January 24, 2018, 07:00:51 am
@Psyklax: Sorry for the lack of documentation :P

I think the faster way to get started would be to check the sample profiles included with the download, and try at least one game from the list. Then check the profile with a text editor to see how it works.

Also, I've added a small section in the official website on how to adjust the sample profiles if they don't work on your setup:
http://neko.works/index.php?static=fbcr#profiles (http://neko.works/index.php?static=fbcr#profiles)
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: Guadozoku on January 27, 2018, 09:54:29 pm
This would be great for DS games, since half of them use the same font.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: Aeana on February 10, 2018, 06:26:29 am
I was messing around with this with the PC98 font, and I think there might be an error with how it handles widening the fonts compared to the hardware. It seems to work just fine with kana, but when it comes to kanji, it doesn't seem to want to recognize anything. I took a closer look and I think I'm starting to see why.

Here is an example from Legend of Heroes 3:

(https://i.imgur.com/H066X8J.png)

and here is the output for the kanji 大 from fb.exe with text_xmax set to 1:

(https://i.imgur.com/xlkYVdZ.png)


If you look at the Legend of Heroes 3 image, you'll see that there's a bit more open space up in the center of the kanji.  In fact, I've noticed that in any case where widening the font makes it overlap with another pixel, there is a blank pixel on the system, rather than a filled one. 

Same thing with the kanji 祝:

(https://i.imgur.com/o45ty5i.png)

It might be easier to see with this one. You can see in the LoH3 shot how the top right of the 礻has a clear pixel as it touches the 兄, but in the generated character image, it's all squashed together.

The result is that the program can't find the characters since they don't match. I'm not sure if there's anything you can do about this, or if I even correctly identified the issue, but I thought I'd mention it either way. And yes, I did generate the TTF font myself from the same system that this screenshot is taken from.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: baka-neko on February 10, 2018, 09:57:44 am
@Aeana:

The included PC98 font only support regular characters - not bold. Though some games such as the Words Worth Special Disk uses the regular font widened horizontally by 1 pixel, which you can build using FB.exe.

Unfortunately it seems that there is a proper bold font for the NEC PC98 that exist and that some games uses, and I wasn't able to find a way to rip it, as it is not possible to generate using the TTF font - even trying to write with it in bold.
Title: Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
Post by: Aeana on February 10, 2018, 08:37:53 pm
Hmm, that's interesting. As I mentioned, the kana match perfectly when generated with text_xmax set to 1 in the font profile. Ah well, that's too bad.

EDIT:
After using makefont32 (http://www.retropc.net/yui/np2tool/index.html) to generate a new font.rom from MS Gothic, I'm convinced that the bold font here must be generated on the fly somehow, because the outputted font.bmp does not contain any bold characters, and making a new font.rom changes the in-game font to MS Gothic but bold, but with the same characteristics of the clear pixels that I outlined above.