## News:

11 March 2016 - Forum Rules

## Algorithm need: bin-packing with restrictions

Started by Bisqwit, August 31, 2022, 08:29:47 AM

#### Bisqwit

While working on my Simon's Quest ROM hack, I happened across a need for a particular algorithm, that I am sure many other ROM hackers have dealt with previously. The formal description of the algorithm is below; the applied goal of this algorithm is to fit several assembler objects in holes of available free space within the ROM, such that if there is a part of code that requires that jumps or tables do not cross page boundaries, their placements are assigned according to those limitations. In a more frequent case, some objects might be placeable only in certain ROM banks, but I already have a solution for that.
------
This is a variant of the NP-hard 1-dimensional bin packing problem, with per-object placement restrictions. It attempts to find a non-overlapping combination of positions for all objects in the given space.

The input to the algorithm is:
— Number of objects (N). You may assume that 0 ≤ N < 2¹⁶.
— Number of addresses (M). You may assume that 0 ≤ M < 2²⁴. Addresses are numbered 0 ≤ m < M.
— List of objects, numbered 0 ≤ n < N. Each object consists of:
—— Length j, which is a non-negative integer. You may assume that 1 ≤ j ≤ M.
—— A set of zero or more non-overlapping ranges called bins, [m₁, m₂), that object n is permitted to occupy, such that 0 ≤ m₁ < m₂ ≤ M. These ranges may be neither overlapping nor intersecting, but they may be consecutive (i.e. for all k and l where k₁ ≤ l₁, it applies that k₁ < l₁ and k₂ ≤ l₁). Sets of ranges of different objects can both overlap and intersect.

The output of the algorithm is:
— List of placements, numbered 0 ≤ n ≤ N. Each placement is an integer starting address 0 ≤ p < M assigned to object n, or else M if the object could not be placed anywhere.

The output of the algorithm must satisfy the following two conditions:
— For each object n, if pₙ ≠ M, there exists one range [mₙ,₁, mₙ,₂) such that pₙ ≥ mₙ,₁ and (pₙ + jₙ) ≤ mₙ,₂. In other words, each object must completely fit within one of the allowed ranges they are given.
— For every 0 ≤ t,u < N, where t ≠ u and pₜ ≠ M and pᵤ ≠ M, it holds that either pᵤ ≥ (pₜ + jₜ), or pₜ ≥ (pᵤ + jᵤ). In other words, the placements for two objects may neither overlap nor intersect.

Corollaries:
— Objects may be placed consecutively.
— Spanning across ranges is not permitted: a placement where first half of the object fits in one range and the rest fits in another range is not permissible, even if the second range begins exactly where the first ends.
— For each object n, if pₙ ≠ M, it holds that (pₙ + jₙ) ≤ M. In other words, objects must fit within the given address space.

The performance of the algorithm should meet the following:
— Placing 4000 objects, each with 20 possible bins that are identical on most but not all objects, should not take more than 1 second of time on a Haswell class machine (2014 workstation).

Additionally, the algorithm should strive to achieve the smallest number of separate contiguous address ranges where something can still be placed, but it is not necessary to find the most optimal solution.

----

For reference, the classical bin-packing problem differs from the above in that the list of ranges [m₁ m₂) is not given separately for each object, but shared by all objects.
I already have a satisfying solution for this simplified version of the problem, and it is what my current insertor uses. However, it does not seem there is an easy way to build upon that solution to solve the larger problem.

----

Many ROM hackers choose the placements manually, but my insertion process includes literally thousands of combinations of variables that affect the availability of certain regions, and I don't want to choose them manually anymore, if automation is possible. (In other words, don't just reply to tell me it's stupid to require such an algorithm.)

#### FAST6191

Do we also get to add in an additional jump between your fragments if they are too large or just would otherwise make sense?

Personally I would look at some of the solvers for staff schedules as that reminds me very much of those (one I played with a while back having time needing to be covered covered, staff available, supervisor, downtime between which pretty much map directly onto your requirements or could be used to act as restrictions, and could be used to work around human placements as well).

Granted if it has come to this sort of thing then (and I felt the need to finish it) I would also look at what it might take to refactor the existing code to not have to do this -- there can't be that many pointers, relative jumps and whatever else.

#### Bisqwit

#2
The comparison to staff schedulers is very apt, I hadn't even thought of that. Nonetheless, it does not bring me any solution...

My toolchain does not support automatically adding a jump between fragments. Instead, I have already split them in as small fragments as I reasonably can, such as that every function that doesn't directly branch into some other function is a separate fragment; likewise for most items of constant data, much like the -ffunction-sections and -fdata-sections options do on GCC. Particularly cold functions are set to be linked to any bank (i.e. anywhere at all), and are called through a trampoline function that facilities cross-bank calls, although it is three bytes longer per call than a regular JSR.

The automatic splitting with a jump would also cause additional problems, because the two parts, that previously were one, can be connected with branches. Branches would require that two fragments are placed in close proximity to each other (as at least on the 6502, branch targets must be within about ±127 bytes of the branch), and I can't think of a way to codify this kind of relationship conveniently. (And the splitter would need to know where each instruction starts and ends, and whether it's code or data to begin with, and whether the code is cycle-counted or not...)

#### Cyneprepou4uk

It sounds like you rearrange your tables 100 times a day, instead of choosing their positions once considering possible future expansion and forgetting about them.

Anyway, I don't see any questions in your post, only a wall of text, so I'll reply with that.

Quoteit's stupid to require such an algorithm

#### Bisqwit

#4
Quote from: Cyneprepou4uk on August 31, 2022, 02:08:57 PMIt sounds like you rearrange your tables 100 times a day, instead of choosing their positions once considering possible future expansion and forgetting about them.
I think you missed the part where I mentioned thousands of combinations of variables. Each of which can be chosen by filling in options on a web form. Thanks for your (absolutely useless and hateful) input.

#### Raeven0

How well does best-fit-decreasing perform in this case? i.e. placing every object, in order from largest to smallest, into the smallest eligible unreserved space, or else at M if no eligible space exists

#### Bisqwit

#6
Quote from: Raeven0 on August 31, 2022, 08:32:21 PMHow well does best-fit-decreasing perform in this case? i.e. placing every object, in order from largest to smallest, into the smallest eligible unreserved space, or else at M if no eligible space exists
That is what I am actually doing for the simpler case mentioned above. It works like this essentially:

— Step 0: Sort the items in descending order of size.
— Step 1: Sort the bins in ascending order of size.
— Step 2: For all bins (from smallest to biggest) that are smaller than small_bin_threshold,
—   If there is an item that is exactly the same size as the bin (use binary search), place the item in that bin, delete the object and the bin, and continue the loop.
—   If the number of items < threshold1, do the same for all unique combinations of two items.
—   If the number of items < threshold2, do the same for all unique combinations of three items.
—   If the number of items < threshold3, do the same for all unique combinations of four items.
— Step 3: For all items (from biggest to smallest),
—   Step 3A: Find the smallest available bin to fit this item, using binary search. Check how much room there would be left in the bin if the item was placed in this bin.
—   If the remaining room is nonzero, and number of remaining objects < threshold4, check if there is another item that together with this item, completely fills the bin. If so, delete these two objects and the bin, and continue the loop.
—   If there is another bin that is large enough for this item, run the operations outlined in step 2 on the currently selected bin. If the bin was deleted, go back to step 3A.
—   If no combination was found, place the object in the smallest available bin. Delete the object. If the bin is full as a result, delete the bin too. Otherwise move the bin to its right position to preserve the sorting.

However, it is not trivial to extend this solution for the problem described above. In which order should the items be processed? If we go largest first, we might inadvertently rob solutions from a smaller item that had very few possible placements.
For example, say, we have a 4000-byte blob that can be placed anywhere within a 4096 byte bank (addresses 0–4095), and we have a 50-byte blob that can be placed in any address that is a multiple of 256 (that is, within one of 0—49, 256—305, 512—561, and so on up to 3840—3889). If we first place the 4000-byte blob in the beginning of that bank, the only remaining addresses are 4000—4095, and there is no way to place the 50-byte blob now as all its candidate placements were robbed.
But if we first process items that have strict placement requirements, we might also inadvertently rob solutions from larger blobs.
And by far a bigger problem is that there is now several sets of bins, some of which may be subsets of other bins.

#### Raeven0

Quote from: Bisqwit on September 01, 2022, 09:06:29 AMFor example, say, we have a 4000-byte blob that can be placed anywhere within a 4096 byte bank (addresses 0–4095), and we have a 50-byte blob that can be placed in any address that is a multiple of 256 (that is, within one of 0—49, 256—305, 512—561, and so on up to 3840—3889). If we first place the 4000-byte blob in the beginning of that bank, the only remaining addresses are 4000—4095, and there is no multiple of 256 in that region.
But if we first process items that have strict placement requirements, we might also inadvertently rob solutions from larger blobs.
In the universe of just these two blobs, the way I'm reasoning out the problem, placing the 4000-byte blob at 0 is not a best-fit placement, because doing so removes more than the minimum number of bins from the universe: specifically 1 more than the minimum, in that it unnecessarily removes the [0,50) bin. A best-fit placement would put the 4000-byte blob anywhere between 50 and 95.

I think this best-fit concept can be formalized with a cost function, say C(m) defined for each address m as the number of distinct objects having at least one bin that contains m; define the cost K(n,m) of placing object n (of length j) at address m as the sum of C(mi) for mi in [m,m+j); and then the best-fit placement for n is the m that minimizes K(n,m). (If K is minimized at more than one m, use the least such m.) If you place objects in order from largest to smallest, minimizing K for each placement, heuristically this avoids placing objects where they might prevent the placement of later objects.

But this is still only a heuristic, and there are probably universes where it will guess non-optimally. How much this matters depends on the quality of heuristic you need... If you need an optimal or at least bounded-error solution, it's back to the drawing-board.

#### Bisqwit

#8
I managed to solve this problem to my satisfaction. I also found out that my initial description of the problem was still somewhat incomplete and lacking. The actual need was that there is a global set of bins, and each item has two types of restrictions that limit the bins they can be placed in, and in which addresses:
— Page requirement: Requires that (address / granularity) matches one of the given options, with granularity being typically 0x4000.
— Proximity requirement: Requires that (address + offs1) / modulo == (address + offs2) / modulo for some values of offs1, offs2 and modulo — where modulo is typically 0x100. The item might have an arbitrary number of these requirements, including zero.
There was also a third constraint, that I decided to drop for now:
— Group requirement: Requires that items having the same group key are stored in the same page (see page requirement, above).

I was trying to convey the problem in a manner that distances itself as much as possible from hardware concepts, and in the process, I introduced a significant flaw into my description, that I did not see until several days of working on it.
But it is solved now.

https://bisqwit.iki.fi/cv2fin/dev/packbins-golden.zip is the source code, and it includes unit testing using GoogleTest framework.

Description: First, for each item, the global set of bins is split into multiple separate personal bins according to the constraints.
The page constraints are trivial. Let's focus on the proximity constraints:
Example: You have one 0x500 byte bin, that begins at address 0x1100.
And you have one 0x321-byte object, that has a single constraint, that offsets 0x22 and 0x50 must be placed in same 0x100-byte page.
The solution is to split the bin into three bins:
[0x1100,0x14D0]  where valid starting addresses are [0x1100,0x11AF]
[0x11DE,0x15D0]  where valid starting addresses are [0x11DE,0x12AF]
[0x12DE,0x15FF]  where valid starting addresses are [0x12DE,0x12DF]
It is worth noting that these bins do overlap, contrary to my expectations when I initially described the problem.
If the object has multiple constraints, the same operation is performed on the resulting bins and the next constraint. That is, the bins may be iteratively split into more and more bins. This forms the item's personal bin set.

Next, all items are grouped such that items with the same exact set of personal bins are listed together. This creates a structure of bin-sets: A list of different (unique) sets of bins. This list is sorted in ascending order using a metric, that takes into account the number of bins in the set, the size of the biggest bin in the set, and an "importance" metric which is the maximum number of constraints that produced this bin set.

Then, these bin sets are processed in the aforementioned order. The items subscribing to that bin set are placed into the bins using the simpler bin packing algorithm described above. However, if there was more than one item that was subscribing this bin set, delete any bins that overlap other bins, because the simple algorithm cannot deal with overlaps and would otherwise assign the items overlapping addresses.
Once the item(s) are assigned to bins, give them physical addresses, and delete the corresponding address ranges from all of the remaining bin sets, to make sure the same address is not assigned twice.

Now, because I now have an automatic method for bin packing, I can be more liberal of where I add page-crossing constraints, to just slightly reduce chances of lag, without much of a second thought.