News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Real-time extraction of text from memory of emulated games?  (Read 1909 times)

Chopin

  • Newbie
  • *
  • Posts: 1
    • View Profile
So, I'm learning Japanese and doing so by finding all sorts of interesting ways to do immersion--and these days there are so many tools that facilitate that, like hover dictionaries and apps that can extract text from visual novels.

It seems like the holy grail, though, would be to find some way to extract displayed text from emulated games and display it in real-time, so that tools like hover dictionaries could be used on it. RetroArch can already sort of do this with its AI service, but that uses OCR calls and is amazingly slow (not to mention expensive if you're doing literally thousands of calls for one playthrough), and the text gets sent to a webpage that has to be refreshed manually. It's designed to perform a machine translation and then overlay that over the game, so it's not really effective for getting raw text output if you're studying the language. It's hardly feasible at all to refresh a webpage for literally every single line of text in a game, especially when the OCR calls themselves take time.

Clyde Mandelin can be seen using a tool that does exactly this for his live translation of FFVI, and it seems it's one he made himself, though he's never released the code for it. Clearly it can be done, but I'm assuming it's not necessarily all that easy since it hasn't already been done with a general-purpose tool.

https://youtu.be/YALeCaRmrqU

So, what exactly would be entailed in such a project? I'm guessing you'd probably need some kind of unique table for every single game you played, since there was probably no single standard method of storing text in games pre-2000s, but might it be possible to somehow automate the creation of such tables?

Nightcrawler

  • Hero Member
  • *****
  • Posts: 5790
    • View Profile
    • Nightcrawler's Translation Corporation
Re: Real-time extraction of text from memory of emulated games?
« Reply #1 on: July 16, 2020, 09:35:00 am »
He has released this tool. This tool is called 'Wanderbar' was released here:
https://www.romhacking.net/forum/index.php?topic=27177.0
TransCorp - Over 20 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Glory of Heracles IV SFC/SNES Translations

Vehek

  • Full Member
  • ***
  • Posts: 206
    • View Profile
Re: Real-time extraction of text from memory of emulated games?
« Reply #2 on: July 16, 2020, 11:02:30 am »
And to point something out, that tool displayed previously extracted text, it didn't hook text.

FAST6191

  • Hero Member
  • *****
  • Posts: 3100
    • View Profile
Re: Real-time extraction of text from memory of emulated games?
« Reply #3 on: July 16, 2020, 03:25:55 pm »
Two approaches would be to have something (think cheat engine, artmoney, emuhaste) reach in and grab text or have the emulator punt it out (many emulators have lua support for a reason). You might also want to poke around some disabled/blind gamer forums as they have some options here (if you can read high contrast large text rather than grey on dark/faded graphics for a device that considered 240 vertical pixels an acceptable resolution...).

As you surmise most old games have their text handled very differently, and newer ones still do (though in later devices they lean into standard encodings more and might not have to jump through quite as many hoops to display all the characters they want -- waste a couple of kilobytes on a DS game and nobody cares, waste a couple on the NES and it is a big deal). Even games with identical encoding but one does auto new line and another needs new lines dictated would necessitate enough of a different approach that all talk of generic approaches becomes almost meaningless (or is by the time you add in formatting, punctuation, graphical effects, line positioning options, placeholders, calculated values and prebaked options -- the yes/no box often being a value, names may also be).

You can still do something. Indeed give me a game and if I have made a text spitter program before then I will probably have that sorted more quickly than I will have a useful dump of the text, and if I don't it is mainly because some late stage part of the game did something odd that I did not catch if I only used earlier parts/an incomplete playthrough.

In general then you would either have your external program monitoring a place it knows the text will be loaded in and decoded in or your internal lua script doing the same. Anything that happens there will be noted, decoded with a table you previously made, any relevant handling/formatting taken care of as well and spat out into a readable format in something else.

aqualung

  • Full Member
  • ***
  • Posts: 225
    • View Profile
Re: Real-time extraction of text from memory of emulated games?
« Reply #4 on: July 20, 2020, 05:18:17 am »
And to point something out, that tool displayed previously extracted text, it didn't hook text.

Perhaps it's my lack of romhacking knowledge speaking here but, despite what you say is right, I don't see why Wanderbar couldn't be used to extract a game rom's texts and display them on the bundled browser window. Some time ago Mato made me a little lua script for the snes game Jungle Wars 2 that, whenever a text dialog appears on the screen, it captured that dialog lineID and showed it on the browser. If that value can be get, I can't see how the full line of text, that I suppose must be loaded into ram as an array of hex values, cannot be captured and shown in the browser too, provided the text don't use a compression routine or something (*).

(*) yes, I know, the text would be on an unknown encoding, but we could have a hashmap or a dictionary (or whichever structure lua handles) built in the lua script with the correspondence between the game encoding and its corresponding html encoding and make an on-the-fly conversion before printing the text on the browser (hope I managed to explain myself well).

For instance, if the game has the hiragana for "a" encoded with the hex value 0x1E,the hashmap would have a key/value entry with "0x1E -> &#3042". And the same for every character.

Would something like this be possible?
« Last Edit: July 20, 2020, 05:55:29 am by aqualung »

Jorpho

  • Hero Member
  • *****
  • Posts: 4783
  • The cat screams with the voice of a man.
    • View Profile
Re: Real-time extraction of text from memory of emulated games?
« Reply #5 on: July 20, 2020, 11:31:47 am »
I recall way, way back in the day when Demiforce was making the Radical Dreamers translation, he mentioned something about using a special build of ZSNES that made giant save states. I don't think that was ever released, though.
This signature is an illusion and is a trap devised by Satan. Go ahead dauntlessly! Make rapid progres!

Tomato

  • Sr. Member
  • ****
  • Posts: 365
    • View Profile
    • Legends of Localization
Re: Real-time extraction of text from memory of emulated games?
« Reply #6 on: July 20, 2020, 02:28:12 pm »
Yeah, Wanderbar can definitely extract text (or any data really) on the fly, I just prefer pre-dumping text when possible because I can manually edit it and format it to fit my needs.