News: 11 March 2016 - Forum Rules

Author Topic: Find-Replace Patcher  (Read 2369 times)

STARWIN

  • Sr. Member
  • ****
  • Posts: 454
    • View Profile
Find-Replace Patcher
« on: December 15, 2014, 10:32:53 am »
Recently I tested if I could apply a patch meant for a headered version of a SNES ROM to the headerless ROM. The tested patch formats included IPS, UPS, BPS, PPF, xdelta3, bsdiff and linux diff/patch. They all failed to produce the file I could produce by adding the header, patching, and removing the header.

Some failed because they use offsets and others failed because of the checksum mismatch (so I don't know if they would have produced the correct file). The linux patch thing seemed to only work for text. Overall I imagine that all these are meant to change a specific file into another, and as such operate by file offsets.

Does anyone know of a patcher that would pass this simple test?

It is interesting that the lack of a checksum allows some IPS patches to target differing source files with the desired outcome, as checksums are otherwise thought as a good thing. IPS only requires the file size and structure to remain the same in the affected offsets, and that the file usage regarding these offsets remains the same. If these offsets contain program code, there is also the added requirement of RAM manipulation by the code not conflicting with any other code that gets executed.

In a typical (?) patching process, offsets are used to identify the location of bytes that will be changed/removed/inserted. An another way to identify the location would be searching for a unique byte sequence that resides in/near it, and then perform the necessary operation.

I think I might try designing/implementing a patcher that operates this way.

Ideally this more local approach would allow one patch to work against many differing source file versions. Examples include headered/headerless ROMs and different language/release versions of games, as long as they share the bytes-to-be-changed. A less common use case would be targeting bytes that have been rearranged to a different location by some other hack, though all inserted references and jumps containing an absolute offset to a shifted location would still break (if the patcher doesn't have knowledge about the arch in question, at least..).

Creating a patch format just to solve the SNES headered/headerless -issue would be trivial (offsets would be enough), but taking the more generic route sounds more interesting, as it should solve the simpler case as well.

The design consists of two primary parts: the identifier and the operation. I'll focus on the identifier.

While a some kind of hash function could be used to identify, I think it is simplest to use the raw bytes themselves unless the lack of uniformity in the data becomes an issue (leading to long identifier sequences). Though if rare bytes are to be modified, the lack of uniformity is actually beneficial.

The identifier should be as local as possible. For example, if data is changed in the beginning of a SNES ROM, a patch using header bytes to identify the location will fail for a headerless ROM. Ideally the identifier resides within the change location, so if it cannot be found, the file couldn't be patched anyway. However in general this degree of locality is unrealistic, so the patcher has to be only ~as local as allowed by the file structure.

The perfect example of both non-locality and long identifier sequence would be code inserted in the middle of a long run of 00s or FFs ("empty space"). I think the best way to solve this is to include a relative offset to the identifier (allowing us to name the emptiness by the unique bytes at its edge and an offset). This much locality has to suffer.

As this type of a patch is meant for an ambiguous collection of source files, failure will always be a possibility. If the byte sequence is not found in the entire file, the patcher won't know where to operate. If the source file is improper, this is good, as we can then deduce the validity of the file without a conventional checksum. If a differing file has exactly one similar spot, patching will happen and in the worst case it will break later when playing the game. But how likely is this to be a problem? I'd say only with shifted data (applying a hack to an another hack) with absolute references.

One could ask why the identifier has to be unique, as just adding a count to any identifier would make it unique. The reason is that locality would suffer, and the patcher would also get confused if data was swapped. Just as with file offsets.

I can think of some extra functionality for patch creation. Giving a collection of alternate source files as hints to the patcher would allow it to choose identifiers that uniquely match all files, or notify the user of the task being impossible. Architecture knowledge would be both an interesting and scary form of extra functionality (think of searching for correct places to point pointers/jumps to).

I don't have any special ideas about the operations themselves. Which probably means that something primitive can be made without much thinking. Detecting a data shift would mean scanning the source file for each difference during patch creation. If nothing is found, insert new data. If something is found, change source scan pointer location there and continue. I'll probably think more about it if something doesn't work.

Any feedback on the idea? Issues noticed?

Nightcrawler

  • Hero Member
  • *****
  • Posts: 5792
    • View Profile
    • Nightcrawler's Translation Corporation
Re: Find-Replace Patcher
« Reply #1 on: December 15, 2014, 07:30:23 pm »
Does anyone know of a patcher that would pass this simple test?

Yes, NINJA does.

Quote
In a typical (?) patching process, offsets are used to identify the location of bytes that will be changed/removed/inserted. An another way to identify the location would be searching for a unique byte sequence that resides in/near it, and then perform the necessary operation.

You could, but that methodology is subject to false positives, and bloated sizes as you need a long identifying byte string to be more reliable. I think most people are more interested in reliable patching.

I think a better approach is a patching format (such as NINJA) that strips away superfluous information such as the header, and reliably applies changes using offsets and operations relative to the common useful area of interest (that results after stripping). This pretty much takes care of all practical cases where you can reliable apply the same patch to different source files. You could make an argument for cases where the bytes are shifted. However data shifting is usually accompanied by an altered loading code to go with the new location. This would render most patches of this nature incompatible unless it were very simple and changed data only. That's probably going to be the minority of cases.
TransCorp - Over 20 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Glory of Heracles IV SFC/SNES Translations