News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Data Finding  (Read 2487 times)

Yohyzo

  • Newbie
  • *
  • Posts: 2
    • View Profile
Data Finding
« on: June 25, 2013, 08:25:29 pm »
Hello. I was wondering if I could get some advice on methods of finding the location of data. Specifically I am trying to find monster stats in a Grandia 3 for ps2 and my experience with romhacking consists of simple hex-editing. The only method I know for finding data is just opening the files with a hex editor and looking for it but I was hoping there were better ways. Any help is much appreciated.

Spikeman

  • Hero Member
  • *****
  • Posts: 1063
  • *unce unce unce*
    • View Profile
    • None at the moment, check out my Last.fm page instead?
Re: Data Finding
« Reply #1 on: June 25, 2013, 08:51:32 pm »
The best method would to be to use a debugger to figure out what the code is doing and how the data is actually stored.

A less advanced method would be to use a corrupter utility (I prefer this one: http://www.romhacking.net/utilities/5/). Although I imagine this might be difficult to do with a PS2 game, for many reasons.
Open Source Hacking Projects: Guru Logic Champ, Telefang 2, (Want more? Check out my GitHub!)

FAST6191

  • Hero Member
  • *****
  • Posts: 3008
    • View Profile
Re: Data Finding
« Reply #2 on: June 25, 2013, 09:50:43 pm »
This is a subject as broad or broader than any other in ROM hacking and is basically half of ROM hacking (the other half being how do I interpret data I have found).

First the PS2 much like most other consoles that use a CD/DVD for game storage uses a filesystem and probably does not have the thing mapped to memory. This changes things a bit and makes some things easier and some things harder. For the record the PS2 is a mix between standard iso 9660 (open with any iso editor) and raw DMA (will have to fish it out with a hex editor give or take a few area specific tools) depending upon the game but more on that as things go on.

The king of the hill is a method called tracing, it is flawless but as it necessitates messing around with assembly it has a higher barrier to entry than most other methods though not as much as you might think (indeed I reckon it could be done without knowing a single assembly instruction, it is better if you do but not necessary for a lot of it). For this you need a proper hacker emulator (not quite sure what the PS2 stuff is up to these days) and you start by finding the data in memory (viewing it in hex, cheats, tile viewers or plain seeing it in the game and working back from there giving you the first "ah there it is in memory"). When discussing the GBA I usually link http://www.romhacking.net/documents/361/ and it is much the same for every console (see it in memory, reset the game and set a breakpoint for something to write to that memory area, the emulator will stop things when that happens and assuming it is the data you want you follow it backwards, typically it is only one (normal cases) or two (when compression/manipulation is necessary to deal with) steps.
This is where the "standard iso 9660 and raw DMA" stuff comes back in as on the GBA (and the SNES and the NES and everything else a lot of the RHDN userbase plays with more commonly) the ROM is in memory and handled accordingly. The NES has things like mappers to contend with but it is still there, the playstation (and other CD/DVD based consoles or indeed more modern consoles like the DS) will not have this and instead have a function (both in the game and more generally in terms of hardware) to read from it; you get to decode this function read and/or hardware call to the game storage section.

Next comes everything else. Where tracing was flawless everything else with the possible exception of corruption if you work at it long enough has huge pitfalls.
Filesystem stuff.
Back to the filesystem stuff though it makes tracing subtly different and perhaps more difficult it has three huge perks
1) Filenames/Extensions -- developers are not always that hell bent on not having J Random Hacker from getting at the files in a game and they have to get stuff done so the file names and the extensions of the files can give an awful lot away.
2) Directory structure -- much like 1) really.
3) File sizes -- the 200 byte file is probably not going to be your 5 minute heavy CGI cutscene. It might be a cutscene if such a thing is rendered in the game engine though.

You can also use all three in the "remove files from the search" list -- the contents of the audio and movie directories will probably not interest you if you are after ripping the text which is a bonus as audio and video take up an awful lot of space. Now the developer could have chosen to stick the files in an archive file for some reason, it is what a good chunk of DS hacking is concerned with when it comes down to it. There is also the issue of magic stamps, for various reasons developers like to indicate file types by putting parts of the header, parts of the file or parts of the footer as known values
http://astrogrep.sourceforge.net/ is great here.


Other methods
Hex searching.
I typically use this for and teach this to people that are looking for a palette for an image -- rip it from memory, do a search of the ROM for it or some fragment of it and you might be surprised how often it works. Things like compression, dynamic data (a number might change after it has been read from ram) and manipulated data (the way something is stored and the way something ends up in memory need not be the same) will trouble this. That said I have seen people find monster stats this way, typically it comes with a lot of work in the analysis stage (more on that a bit later) but it can happen.

Relative searching.
Half decoding method, half search method. In most languages that are not logographic/ideographic the letters have a defined order (even in Japanese though the kana have a few accepted orders). More generally this means A=1 B=2 C=3 and so on or something like it for the text encoding.
You find a string from the game and it looks for patterns in the game to match the order if the game uses a relative encoding.
http://www.romhacking.net/utilities/513/ is my tool of choice for this though Crystaltile2 also works for me.
There are tricks to picking a good string (try not to have capitals and lower case mixed, try not to have variables in the searched string, try not to have sections of different fonts/text effects in the searched string, try not to have a new screen, new paragraph or even new line start in the searched string, long is good but too long is possible (usually by virtue of things in the previous point popping up), short works but that often makes for a mountain to dig through) and I have been known to live dangerously and do a search for something like " the " as the with spaces either side is quite common in English.

Text decoding methods are many and varied too. You can do things like frequency analysis (space is the more common character in almost all texts, spaces are rarely more than 12 characters apart, scrabble letter scores are picked for a reason), linguistic analysis (every word has a vowel or y in it in English), combination analysis (if you have the space character you can fill in the blanks)..... you eventually arrive at my much enjoyed method "corruption".

Corruption.
You have an emulator the runs the game from your hard drive, of if you are feeling time happy and money happy then you could burn discs every time if you like.

This affords you the option to change something in the game (do note the filesystem stuff from earlier) and run it. If something is broken and you changed something.... the game could crash, the game might not load what you want for a while and other things like that but eventually you zero in on things.

Going back to the text decoding for a moment I like to change text to repeating patterns (if you stick a long run of a single character in the text and it displays in the game then you know what that character is, what the ones beside it are and can work out the rest. Do it properly and add a pattern like 1112 1213 1313 ..... and you see a pattern in the game of characters getting one more in a run and one higher and you are laughing).

You can do this with any halfway decent hex editor by inverting things (which has the added bonus of being able to turn it back easily), filling sections, pasting in new data.....

Some consider this a bit crude... and it is but it is potent and though the traditional "corrupt half the ROM and go from there" approach is probably to be avoided a slightly more subtle approach can see it do things for you (the text decoding by forcing the game -- why spend 20 minutes messing around with assembly code when you can just force the game to decode it and note the result).
Two main issues are game devs probably did not expect random changes in their data so games can get a bit odd when you do this and in some instances the corruption can trigger antipiracy protection (again game devs did not expect random changes -- it is called a ROM or CD/DVD-ROM for a good reason) and any changes might trigger this depending upon the game.

Analysis
I already mentioned magic stamps but the ability to detect certain types of files and methods (a process known as fingerprinting) is much more varied than that. On the GBA the cart is typically read from 08?????? so if you see lots of 08 values with 6 random values in between them (better yet if they get larger every time) you probably have a pointer section. Pointers to what I have no idea but pointers... well they point to things that are typically not random data and they will probably also have given you the lengths and starts of the sections at hand (they point to the start of things and the next one will come at the end or shortly after the previous one).
Many around here will seemingly also stare at a hex editor and decode files, this is not exactly what happens as they will probably be shifting the line length (if each section gives over so many bytes to the name and so many to the location and so many to the size it might not be obvious if you just see the world 10h bytes at a time, shifting it can have each of those appear on one line and things get very easy from there), looking for common things (lengths of files to be referenced, magic stamps further down in archive files and things that point to just before them). Beyond that pasting a few lines into a spreadsheet and manipulating numbers can make for some interesting things, the obvious one is if your archive format does not have file lengths and you obviously need those to extract things then by taking the start of the next file and then taking the value of the location of the start of the previous one you have your length value or something close to it (for various reasons game devs will pad other files out so the next one starts at a nice number).
Back to the stats thing you have the game in front of you, if the game is nice enough to provide you with a bestiary then that is probably also the order the game stores the enemy stats in (if not it probably is still logical -- as the dev you are effectively making a database after all and a bad database schema is to be avoided). If the game bestiary tells you attack, def, hp... then convert back to hex (this is also the reason why stats are often limited to the powers of two or half that in the cases of signed values).
Analysis also extends to looking at the thing is a tile editor, noting the sections that are 2d graphics and ignoring them or looking in a tile editor and looking for graphical patterns (humans are pretty good at seeing patterns in colours and less so at seeing them in raw hex).

Anyhow 3am is rapidly approaching so I will tie it off here. Naturally you can combine methods (I encourage it) and this was but the briefest of overviews so if you can see other approaches that come from others here or are logical variations on the theme then congratulations you are probably destined to be a ROM hacker.

Yohyzo

  • Newbie
  • *
  • Posts: 2
    • View Profile
Re: Data Finding
« Reply #3 on: June 26, 2013, 04:30:13 am »
Thank you for that very informative reply. I understood most of it (sorta) and I've already done a lot of hex searching and relative searching without any luck. Right now I'm trying to use an emulator and Cheat Engine (which seems to have a debugger) to find the data. I've already found the data in memory and I can use Cheat Engine to set breakpoints and maybe trace the data though I'm not quite sure how it works.

This is where I'm at:



That 4d that is highlighted in green is the monsters HP and other stats follow shortly after that.

And is it possible the data would not show up when I look at the contents of the disk? Like hidden files or something?

Another thing I was wondering was how to re-insert any modified files back into the ISO. I used to use CDMage for psx isos but it doesn't work for ps2.

FAST6191

  • Hero Member
  • *****
  • Posts: 3008
    • View Profile
Re: Data Finding
« Reply #4 on: June 26, 2013, 06:39:02 am »
Health is a tricky one -- being inherently variable (your want to kill your enemy after all) and often drawn up from an equation (level *200 + constitution.....) it is not always stored as a raw value. Still I would definitely fire a search off for the following data in the game. Despite being a huge fan of the earlier games I have not yet played the game in question or even read a FAQ so I can not offer any advice on the matter.

Cheat breakpoints and debugger breakpoints are subtly different, both are very useful for what they do but they do not play in the other's world quite so well as you might expect (cheats a bit inflexible and debuggers are too broad to be easily useful and too narrow to allow for the more fuzzy search things that cheat makers rely upon). There is also the secondary issue I forgot to mention earlier of memory allocation -- more advanced programming languages (which the PS2 could and did make use of) will not allocate memory at compile time but at run time (if you are familiar with cheats this would be one type of, and probably the most common type of, pointer cheats) so something might not be the same place twice (quite frustrating if you are repeating a process but worked around easily enough -- if you make a pointer cheat you have all you need to direct the breakpoints).
Still the idea is simple enough
The stats are probably there just for the battle so you set a breakpoint on the area
A new battle appears and the debugger halts things and says it has been written to
If it does not say from where you have to work backwards a bit but usually it says where from
From here you go to the source for the write, it might be somewhere else in memory or it might be directly from the disc somehow. If it is somewhere else in memory you get to repeat the process as you go down through the layers it took to get to the game (more layers is not beneficial to anybody so it is usually not many)
This is the tricky part on no mapped consoles as you get to decode how the disc is read
Two main options
1) It tends to narrow down at the hardware level. Here you watch the hardware responsible for the disc reads.
2) It tends to have a function in the game (this function is often common to all games for a system as well as it is one of the things the console maker will provide people as part of the dev kit, the DS for instance uses one almost exclusively) -- the read function will expect a location, a size to read and a destination in the memory. Do note location need not be an actual location but a file number (also known as an ordinal read/call depending upon how you want to look at things for your given programming language) or a file name in the case of it using iso9660 (or other systems with defined filesystems). I do not know what goes in the playstation world but I have seen hidden/virtual filesystems so raw DMA reads (more on that in a moment) might have their own filesystem as well (though I mentioned the GBA as not having a filesystem there are several that GBA homebrew developers can and do use).

Editing PS2 isos.... that might be the DMA and ISO9660 thing speaking or it might just be CDMage falling short (I tend to either use a hex editor or hope one of the other tools works -- I have not touched CDMage for anything in years though that does not mean it is bad, just that I have not used it).
ISO 9660 is the standard way to hold data on a disc (give or take UDF and Joilet which you might have seen in your burning program) though it has a few variations within itself too. Not all consoles use it (indeed most do not) but the Playstation brand does.
However you also have the raw DMA option as well, companies like square (enix) and other RPG makers being quite fond of it. Its use is many fold (part anti piracy, part hacker frustration, part actual technical good reason, part possible incompetence and so on) but that matters little here. The idea is that where iso 9660 has a nice list of file names, locations, sizes and more the raw DMA (DMA = direct memory access in other instances and kind of applies here too) reads directly from the disc and can have files/data not included in the iso9660 dump but readily available to the console. This is also why your multi gig iso might not extract much data from something like CDMage. Also if CDMage troubles the raw DMA stuff as it edits blindly then you get into trouble. If I am playing playstation hacker I tend to have to figure out what goes here but raw hex editor replacement works well enough if you only have a few light edits to make (and given at this point this is analysis and testing you only have a few light edits to make).