News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Font Based Character Recognizer for Japanese games - Looking for feedback!  (Read 4048 times)

baka-neko

  • Jr. Member
  • **
  • Posts: 14
    • View Profile


Hello,

I've made a free software that detects any Japanese text in a running game and copies it to the clipboard. It is then possible to use a 3rd party software such as Translation Aggregator to display the furiganas (Kanji readings) or even a translation!

The requirement is to provide a profile for each game including a pixel perfect matching font.

The font should look like this:





Please give it a try making profiles / fonts for any game / platform that you like. I would love to have early feedback and suggestions. Thanks!

Download and more information on the official website:
neko.works/FBCR

Here's a few usage example:



« Last Edit: January 06, 2018, 02:51:59 pm by baka-neko »

Squall_FF8

  • Full Member
  • ***
  • Posts: 198
    • View Profile
Hi baka-neko,

I think you have a great idea for fast translation of games, that don't require patching the game. I can think of number of applications of that idea!

There is one tiny detail that I didn't get:
Quote
I've made a free software that detects any Japanese text in a running game and copies it to the clipboard.
What triggers the translation?

chillyfeez

  • Hero Member
  • *****
  • Posts: 722
    • View Profile
A couple of questions:
Does the program, um, know what it's reading, or is it only picking up instances of the characters the user feeds into it? What I mean is, once I give it this profile of potentially hundreds+ kana and kanji, does it match them with its internal alphabet, or do I have to go through some sort of mapping process?

Is it possible to implement a function that allows the user to feed a sprite sheet (so to speak) of Japanese characters instead of individual images of each character? I don't know much about later consoles, but I know that with a lot of 8- and 16-bit games I can load the ROM into a graphics editor and instantly make a BMP copy of the entire alphabet. Copying individual characters, though, would take that same amount of time, multiplied by the number of characters used in the game, meaning hours of prep time for each new game.
Ongoing project: "Final Fantasy IV: A Threat From Within"

Latest Demo

baka-neko

  • Jr. Member
  • **
  • Posts: 14
    • View Profile
There is one tiny detail that I didn't get: What triggers the translation?

Hello,

For the translation, you have 2 options:

- Real-time machine translation using Translation Aggregator, or any translation software that read the clipboard.

- Soft translation, basically, when FBCR detects the Japanese text, it will write it in the dumps sub-folder in the file id_of_the_game.txt, with each unique sentence on a separate line. If you translate each line and save it to a file named id_of_the_game.EN.txt, and activate the translation in the profile by adding the command translation:EN, when playing the game, instead of copying the Japanese text, FBCR will copy the matching translation to the clipboard. You'll just need a software that display English clipboard, such as "clipbrd" that comes with Windows.

baka-neko

  • Jr. Member
  • **
  • Posts: 14
    • View Profile
Does the program, um, know what it's reading, or is it only picking up instances of the characters the user feeds into it? What I mean is, once I give it this profile of potentially hundreds+ kana and kanji, does it match them with its internal alphabet, or do I have to go through some sort of mapping process?

The detected Japanese text is converted to a standard UTF8 sentence that is copied to the clipboard.

Is it possible to implement a function that allows the user to feed a sprite sheet (so to speak) of Japanese characters instead of individual images of each character? I don't know much about later consoles, but I know that with a lot of 8- and 16-bit games I can load the ROM into a graphics editor and instantly make a BMP copy of the entire alphabet. Copying individual characters, though, would take that same amount of time, multiplied by the number of characters used in the game, meaning hours of prep time for each new game.

Not possible currently, but you could write a Python script or similar that will separate the characters from the sprite sheet into a format that is compatible with FBCR.

The font format in FBCR is a list of .bmp images, one for each character, named 0001+, plus an UTF8 corpus.txt file containing all the corresponding Japanese characters in one line, ordered depending on the order of the images.

chillyfeez

  • Hero Member
  • *****
  • Posts: 722
    • View Profile
Quote
The detected Japanese text is converted to a standard UTF8 sentence that is copied to the clipboard.
Hmm... Either you didn't understand my question, or I don't understand the answer. Both of my questions relate to the amount if pre-play prep the user would need to undergo before being able to play the game and have text copied to the clipboard.
What I was trying to ask is, let's say I only provide FBCR with one character: 人. Does FBCR automatically recognize that as "jin," or do I have to tell the program, "this is 'jin:'人?"
Ongoing project: "Final Fantasy IV: A Threat From Within"

Latest Demo

baka-neko

  • Jr. Member
  • **
  • Posts: 14
    • View Profile
What I was trying to ask is, let's say I only provide FBCR with one character: 人. Does FBCR automatically recognize that as "jin," or do I have to tell the program, "this is 'jin:'人?"

The font is defined by 2 parts: The .bmp images, in this case an image of 人 named 0001.bmp, and an UTF8 corpus.txt file with all the corresponding Japanese characters on the same line, in this case it will contain just one character: 人.

You can see an example for Surging Aura here:





If you're talking about the readings and dictionary definitions etc. All of these are handled by the 3rd party softwares, such as Translation Aggregator.

Squall_FF8

  • Full Member
  • ***
  • Posts: 198
    • View Profile
- Soft translation, basically, when FBCR detects the Japanese text ...
Sorry I understood the translation from the first post. My question is more like:
What triggers FBCR detection of Japanese text?
 
Lets say I have prepared all the prerequisites. I start the game. I start all the tools (FBCR, translators,...). While playing on the screen Japanese text pops up. How FCBR know/understand that there is something that need OCR. Do I have to press a keyboard button, do I have to click something ...?

baka-neko

  • Jr. Member
  • **
  • Posts: 14
    • View Profile
Lets say I have prepared all the prerequisites. I start the game. I start all the tools (FBCR, translators,...). While playing on the screen Japanese text pops up. How FBCR know/understand that there is something that need OCR. Do I have to press a keyboard button, do I have to click something ...?

FBCR will parse the image from the game every 250 milliseconds and see if there are changes, if there are, it will wait until 3 frames are identical - to wait if the text is still building. After that it will trigger the text detection which is 100% visual - but is not OCR. You can change both of these parameters in the profile. So basically you don't need to do anything.

Though you can force the text detection anytime by pressing F10 - you can also change it to another function key in the profile.

Squall_FF8

  • Full Member
  • ***
  • Posts: 198
    • View Profile
After that it will trigger the text detection which is 100% visual - but is not OCR

"Optical character recognition (also optical character reader, OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example from a television broadcast)" WiKi
In short: an extraction of text from image is OCR :D

Anyway I think you idea is excellent and will have its application. Since I suspect that you do pixels matching its important to use filters that produce the same pixels, regardless what is around, that is why things like Bilinear interpolation should be off.

baka-neko

  • Jr. Member
  • **
  • Posts: 14
    • View Profile
Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
« Reply #10 on: January 08, 2018, 01:52:22 pm »
In short: an extraction of text from image is OCR :D

Might be ;) What I wanted to say is that I'm not using any OCR SDKs or algorithms, as I'm just doing pixel perfect comparisons, with just a few tricks on colors.

Anyway I think you idea is excellent and will have its application. Since I suspect that you do pixels matching its important to use filters that produce the same pixels, regardless what is around, that is why things like Bilinear interpolation should be off.

Thanks! Yes, the game needs to be displayed clean. For instance using RetroArch, you need to disable Bilinear Filtering, shaders, and set Aspect Ratio to 1:1 PAR (to have square pixels). You can use scaling but it will need to be Integer Scales.

Vanya

  • Hero Member
  • *****
  • Posts: 1211
    • View Profile
Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
« Reply #11 on: January 11, 2018, 04:49:16 pm »
Pretty cool. I imagine something like this could be adapted to create an overlay in a custom emulator, not unlike the things that are possible in FCEUX with LUA scripting.

baka-neko

  • Jr. Member
  • **
  • Posts: 14
    • View Profile
Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
« Reply #12 on: January 12, 2018, 06:01:39 am »
Pretty cool. I imagine something like this could be adapted to create an overlay in a custom emulator, not unlike the things that are possible in FCEUX with LUA scripting.

Thanks! As it works with the clipboard, it should be easy enough for any emulators to display the text in-game!

danke

  • Forum Moderator
  • Hero Member
  • *****
  • Posts: 2044
    • View Profile
Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
« Reply #13 on: January 12, 2018, 08:14:58 pm »
This doesn't seem that practical for games with hundreds of kanji. Especially if they aren't in any kind of standard order.

Vanya

  • Hero Member
  • *****
  • Posts: 1211
    • View Profile
Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
« Reply #14 on: January 13, 2018, 03:09:22 am »
This doesn't seem that practical for games with hundreds of kanji. Especially if they aren't in any kind of standard order.

True, but for many games that have a lot of technical difficulty in creating a translation this is definitely way better than nothing at all.

baka-neko

  • Jr. Member
  • **
  • Posts: 14
    • View Profile
Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
« Reply #15 on: January 13, 2018, 11:36:24 am »
This doesn't seem that practical for games with hundreds of kanji. Especially if they aren't in any kind of standard order.

If the font is standard - can be generated from a true type font - such as in most PC games, you can build the font using the included Font Builder in just a few minutes.

If it is not, such as in most console games, the good news is that once you have completed one font, you could probably use it with other games from the same dev / or on the same platform!

elpan

  • Newbie
  • *
  • Posts: 2
    • View Profile
Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
« Reply #16 on: January 21, 2018, 06:48:57 pm »
Hi first of this is fantastic baka-neko. Can see use for some chinese korean games like Wind Fantasy XX. I presume all types of fonts are possible.

I was trying to see what I can do with pc98 version "Kuro No Ken".  My understanding is in the profile, width and height are for the text box area your trying to match with selected font. Then Left and top are how far the box start from the game window. Tried to mess around with that best i could get is .... or --- characters.
I then decided to check some example you provided. Fired up Surging Aura on RetroAct and followed your instructions and got the same thing. I tried to set resolution to native megadrive 320x224 and still got nothing.

Looking at the log I see things like this Valid Rect: LEFT:465 TOP:83 RIGHT : 31 BOTTOM : 0. Wondering what it all means and if it helps. Or perhaps I lack understanding of it all. Thanks.

Jorpho

  • Hero Member
  • *****
  • Posts: 3938
  • The cat screams with the voice of a man.
    • View Profile
Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
« Reply #17 on: January 21, 2018, 08:52:24 pm »
If the game is using standard Windows text rendering, shouldn't it be possible to hook into the process directly and determine exactly what characters the game is using?  I think the NJStar CJK Viewer used to do that.  (Anyone else remember that one?)

Of course, if the game is running in an emulator or somehow using its own text renderer, that would be a different matter.
This depresses me. I feel like a goldfish right now...

baka-neko

  • Jr. Member
  • **
  • Posts: 14
    • View Profile
Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
« Reply #18 on: January 22, 2018, 11:08:11 am »
Hi first of this is fantastic baka-neko. Can see use for some chinese korean games like Wind Fantasy XX. I presume all types of fonts are possible.

Thanks! Yes, it should be possible to use it for any language, as long as you provide the font.

I was trying to see what I can do with pc98 version "Kuro No Ken".  My understanding is in the profile, width and height are for the text box area your trying to match with selected font. Then Left and top are how far the box start from the game window. Tried to mess around with that best i could get is .... or --- characters.

The first thing you need to check is the game's window name, for instance if you're using Neko Project, the command should look like this:
window_name_starts_with:Neko Project

Then check the scale of the window, if it is native (640x400), the command looks like:
scale:1

width and height are the size of the zone where the text is. left and top is the position within the client area. To choose precise values, press F9, this will generate 3 images in the same folder as FBCR.exe, check the file that ends with _full.bmp

Sometime you need to adjust the top value, as the image grabbed from the game change size, use the F9 trick to find the correct value.

Anyway, I've checked the game, it seems it is using the NEC PC98 font, so I've made a quick profile for the NPC dialogues, you would probably need an other profile for the intro and cut-scenes. Here's a shot:


And here's the profile:
https://www.dropbox.com/s/ax8cafxjy6aqbzz/Kuro%20no%20Ken%20%28Forest%2C%20PC98%29%20%5Btimer%2C%20bottom%2C%20white%5D.txt?dl=1

I then decided to check some example you provided. Fired up Surging Aura on RetroAct and followed your instructions and got the same thing. I tried to set resolution to native megadrive 320x224 and still got nothing.

Same here, check the window name, the scale should be the same as the profile, disable all the shaders, and do the F9 trick to check the top value.

Looking at the log I see things like this Valid Rect: LEFT:465 TOP:83 RIGHT : 31 BOTTOM : 0. Wondering what it all means and if it helps. Or perhaps I lack understanding of it all. Thanks.

This is just for debugging, the valid rect is the part of the rect that you have defined in the profile that includes at least 1 pixel with the text color.

If the game is using standard Windows text rendering, shouldn't it be possible to hook into the process directly and determine exactly what characters the game is using?  I think the NJStar CJK Viewer used to do that.  (Anyone else remember that one?)

Of course, if the game is running in an emulator or somehow using its own text renderer, that would be a different matter.

Yes, standard text hooking is possible with many Windows games, but not all, and it is not possible at all with console games.

elpan

  • Newbie
  • *
  • Posts: 2
    • View Profile
Re: Font Based Character Recognizer for Japanese games - Looking for feedback!
« Reply #19 on: January 22, 2018, 06:06:50 pm »
The first thing you need to check is the game's window name, for instance if you're using Neko Project, the command should look like this:
window_name_starts_with:Neko Project

Then check the scale of the window, if it is native (640x400), the command looks like:
scale:1

width and height are the size of the zone where the text is. left and top is the position within the client area. To choose precise values, press F9, this will generate 3 images in the same folder as FBCR.exe, check the file that ends with _full.bmp

Sometime you need to adjust the top value, as the image grabbed from the game change size, use the F9 trick to find the correct value.

Thank you, I understand much better now and was able to test a few profiles. Kuro no Ken is working thanks for the profile.
Some feedback. If possible would be great to be able to load multiple profiles at the same time. or better yet a way to define multiple text zones in the same profile for different areas that text appears. Also the speed to parse can seem slow sometimes.