I had to look up what 'kibitzing' was. Hah.
Hehe. I love it someone else is willing to do all the work and just let me sit in the back seat and kibitz about their driving habits.
Frees me up to be truly annoying.
Regarding a linker phase:
I can see the value in separating the build process into multiple steps for large projects, so you don't have to assemble the entire file each time. An interface for that is worth working in -- and such an interface would imply intermediate object files.
A few weeks ago, I would have agreed with this. But then I saw the Merlin32 assembler code. The darned thing loads up all of the assembly source files -- and I mean ALL OF THEM -- and linked-list chains up the lines in each of them. It then linked-list chains up groups of them into segments. (You can include multiple ASM files in a grouped segment.) It then linked-list chains up the groups into the total project. It goes through each of the individual lines, assigning addresses and tossing out opcodes and the like. Each line is its own tiny "code byte stream." At the total project level, he has another chained list of code and data patches, that the assembly-phase added when it couldn't resolve things then, so that the link phase knows what to "patch in" later on. Once all that is done, each of the individual lines has a fully patched in and decoded "byte stream." (The instructions are never longer than 4 bytes there. But the data can be longer, of course.) The whole thing doesn't care about the physical memory system, or how any of that maps to a ROM, at this phase. It's just a whole lot of individual lines, each with their own ORG address plus a tiny strip of attached data to it. It's only at this point that his code decides how to do output. Note that you can ORG to some place, generate a little bit of code, ORG to somewhere else, generate a little more, etc. So far as these structures care, it doesn't matter if you have 100 separate lines all with the same ORG and different data. It just doesn't matter because none of it understands anything about the memory address space or the ROM. It's just "ORG + BYTES" everywhere.
I had a very easy time jumping into that thing and making it patch a ROM file directly and to support any kind of mapping hardware.
I'm not suggesting you consider using that thing. I'm just pointing out that it didn't need to support separate compilation or object file formats. It's really easy from a user point of view because they don't have to know about separate compilation or object files. So far as they know, they have a project with some ASM files and somehow it all just works right. No binary object files are generated. They just need to have a basic project file listing the sources. (Kind of a make file, I suppose.) The rest just happens.
The question then becomes... does the linker decide what address/offset to put code at? Or does the assembler?
Yeah. That's the question. I can offer only some modest thoughts.
Since symbols can be external, expressions involving them can only be resolved at link-time. If you are going to support externals, then I think you are stuck with that fact. This would mean to me that while the assembler can do some of the constant folding semantics of an expression, so long as the expression carries any external reference in it, you have to defer final computation it until link-time. That implies retaining a reduced expression tree. (It may involve two or more externals in the expression, so what choice do you have then?)
Also, if you are going to support letting the assembler "know about" the DP value, it's possible that the DP expression (the one used in the 'assume' directive the assembler uses to keep track of DP) itself contains externals! So, again, the assembler can't possibly know at assembly-time how to use the assumed-known DP in the context of an LDA instruction that may also reference that external, for example, where the two need to be 'differenced' later to compute the LDA value. So once again, that has to be deferred until link-time. All the assembler can do is to correctly construct the expression tree to be resolved at link-time.
Note that if you do this right and well, a user won't even know all the trouble you are going to, here. All they see is that this somehow always "just works right." And they shouldn't have to care, either. It should just work right.
I shudder at ca65's approach. It has an ORG directive, but it's effectively useless because the linker determines the ultimate address. And I also want something like ORG to be available because it's familiar and simple. So I would lean towards everything being determined by the assembler. Object files would consist of the final assembled output, as well as a list of exported symbols, and a list of symbols that need to be imported and "plugged into" the assembled code.
At some point, I'd like to expose you to that bit of "hacking" I did to the Merlin32 tool to make ASMPATCH. Not because I want you to use it, but because I want you to see how easy it is to specify a patch to a ROM so that you can see if there is anything useful to learn from such examples. I don't know if there is, actually. I'd just like to hear your opinion after seeing an example or two. It might shake out an idea from you.
Furthermore, I would not want the two-step process to be the default behavior for the assembler. I would imagine most use cases are going to be one-shot (even one-file) assemblies.
Yes, I'd like to see something very, very easy to use in the common case of just one assembly source file with patches in it. That's the example I give on the ASMPATCH web link, in fact. Very simple to do. Just works.
If we don't want to go with that idea, how would you want the linker to determine the final address/offsets?
Well, let's exchange more thoughts before I answer here. If it is deferred to link-time, I think it would be "hacked" if there is no expression tree to process at link-time. The Merlin32 tool I started out modifying is really dumb, this way. It ONLY knows about one external and one constant offset to it. And it is really kludgy, as a result. I don't like that. So that would seem to spell out a reduced expression tree presented to the linker, I think. But that's only if you defer things until then. If not, none of it matters.
Syntax for structs is weird, and I'm not sure how I'd want to tackle it. A lot of interfacing with structs requires the coder to manually compute an index (typically by left-shifting the ID several times):
; assuming the struct is 2 bytes wide
; data stored interleaved as: foo/bar/foo/bar/foo/bar
lda mystruct.foo, X
sta mystruct.bar, X
Personally I hate structs and never use them in assembly, since they require that additional shifting which also means larger indexes, more page crosses, and often padding to space the struct to a conveinient size which would otherwise be completely unnecessary.
I've never found a situation where I would prefer a struct to storing each field in their own array. Something like this:
; no struct, data stored in own array, such as: foo/foo/foo/bar/bar/bar
This is why I didn't support them initially. But apparently structs are a popular feature? Barf.
I'm lost on this one. Since this is not something I would use, I have no idea what kind of syntax I would want for it. How would you want the syntax for structs to look?
For bit fields:
Let me think on this. I don't have a ready answer for you. But there are good examples to be found and examined elsewhere. Have you used Microsoft's MASM/ML assembler? It supports structs, masking of bit fields, and so on. It has a syntax for it. Might be worth a look.
You shouldn't need a dropbox account to download the file. Just follow the link and click the download button.
Unless dropbox changed?
Well, it hassled me about setting up an account and I didn't see, right off, a way to avoid it. I'll go try, again, just to be sure.
But yeah I don't mind sending it by email or something. Send me a PM with your email address and I can shoot it over.
If you go to this link, Patching SNES ROMs Directly from Assembly
, my address is at the bottom of the page. If I can't get dropbox to work for me, you can send it there.
Also I forgot to mention this:
My assembler didn't make a distinction between labels and other numeric constants. As far as it's concerned they're all just numbers, so adding two labels together would be meaningless but would be completely legal.
Adding what is effectively type safety to numeric values seems like overkill. What benefit would it have other than guarding the very rare mistake of adding two labels?
Hmm. It's useful in understanding what is being asked during assembly. I gave an example already using that complex LDA to a struct object. There is semantic context there. But there are so many other points that need resolving first (the link-time stuff looms large, which includes a lot of lingering questions still) that I really feel this can wait until I understand your direction better. It would be just "made up," right now, without much context and probably just fall on deaf ears. When I better understand your direction, I may be able to put something interesting into that context, then.
May 17, 2016, 06:16:44 pm - (Auto Merged - Double Posts are not allowed before 7 days.)
About dropbox. GOT IT! I see what I did wrong. Looks good and I was able to pull it down. Thanks!