Subtitles for cutscenes, be they in game or videos, is a pain.
For videos you have two main approaches
1) Figure out the video format and encode them in, with whatever quality loss that causes (PS1 era video codecs that could play back on the PS1 not being renowned for their quality, hence most such projects usually wandering off to find a PC copy or reconstructing the video from trailers and whatnot). If it is a known format with a known encoder then great, if it is a simplistic motion jpeg or simple boxes style codec then great but most likely it is actually something of a then modern one with nice p frames, i frames, maybe b frames, motion tracking and the usual colour formats on top of it all and that is just decoding, you get to reverse that and make analysis code (or maybe you can cheat and not use compression and take the size hit). I usually get people to look at the format for MPEG1 (impossibly ancient and basic, even possibly the time of this game as MPEG2 was already a thing and MPEG4 had just been released as well)
1a) If you have an encoder for something else and code for it then you can potentially then put your own decoder into the game and go from there.
2) Overlay the subtitles on top of the video. Even assuming the video playback does not peg the CPU you still get to figure out the assembly hacks and do them.
In game cutscenes is usually a variation of 2) unless the game already has the functionality (is it available in some other scene, or maybe intro or credits, but not in this one?).
If another region has subtitles suitable for task in the video files (though if it is a nice bit of Japanese embedded in the video then... yeah) and the devs doing localisation had it slip their mind or something then a file swap might be the order of the day. Whether you have to chase this with an audio swap, or swapping out the audio in the video file, is a different matter but still easier than reverse engineering and making an encoder*. That is the sort of thing you might want to try yourself too.
*As I sit here then there are encoders with leaked versions, devs released the source, a few ancient PC games that used "basically a slideshow"/marginally tweaked known formats/ultra simple formats and a few others that usually use known formats with their own container (depending upon where you are in the world the video, the audio, the subtitles and the container might have their own royalties to pay, possibly even on a per released copy basis, hence why game devs often use things like rad/bink, CRI and all the others when they could trivially have made a AVI or MP4 decoder). There are a few more decoders (not least of all because of how many things use rad/bink) and even some stuff that plays like the various audio playback options by emulating a cut down ROM that only knows how to play back stuff. Either way the general dearth of them speaking to how hard this is.