News: 11 March 2016 - Forum Rules

Author Topic: 65816: Direct Page vs Absolute Operand  (Read 21045 times)

jonk

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #40 on: May 21, 2016, 03:15:54 pm »
I suggest that you read the 65816 specs here ... http://archive.6502.org/datasheets/wdc_w65c816s_aug_4_2008.pdf
That was a manual I hadn't yet found. Thanks for that.

This [ed: use of square brackets for indirection] was seen as a good-idea by most progammers because it separated the syntax of expression-evaluation from indirection.
That actually sounds like a good call to me, too.

It [ed: CA65] does use a different syntax for structure definitions and references than the "RS" method that I mentioned previously, probably to simplify its use with the CC65 compiler.
Ignorant of CA65/CC65, the fact that CA65 is and will be designed with the idea of working well with CC65 exposes the possibility of somewhat less than mutual goals in product directions/goals. I believe Disch wants something that a neophyte can start learning to use without having to climb a high learning curve before even getting started. However, I can say that when I looked it over with the idea of simply downloading it and giving it to my son for his current DQ3 SNES work, I knew immediately that I'd have way too much trouble trying to bring him up to the point where he'd use it. In the end, I chose wisely. I found a very simple assembler with source code that was trivial to modify and hacked together a complete, easy to use tool for him. He bit at it, right away, and has only come back to me twice about it (once for the weird LDA: opcode and once for the LDAL opcode.) Other than that, he has found it very easy to apply. He is autistic and has difficulty engaging and navigating through "ambiguity." He is a VERY GOOD test of whether or not someone new to assemblers can just "get started" with a tool. He'd never used an assembler before this and he has even more requirements than many about a tool needing to be very simple to use in getting started.

(It's difficult to fully explain why I say this, but think of it this way: my son can sit down with me and do rather complex perturbation theory in orbital mechanics with ease. But I wouldn't trust him to take a bus across town, as he has little understanding of neuro-typical social norms and expectations. He could teach a university classroom. He could be a university student. Because the roles are well-defined and clear to him. But he would be in anguish if left to his own devices at a "party." So I'm pretty picky about the ease of use of the assembly tool and the learning curves involved, as I was trying hard and unsuccessfully to get him to use an assembler. He was just avoiding it and would not go there, instead hand-coding and using Lua as a patcher because he already knew Lua. This tool I modified for him broke through the barriers. So I not only completely agree with Disch about his goals for the assembler, but I personally feel them as important. And having looked over CC65/CA65 before, with an eye out for my son, I looked elsewhere almost right away.)

If you find that it's missing features that you feel that you need, then may I suggest that it might be more profitable to the 65xx programming community as a whole to attempt to extend CA65 before throwing all the toys out of the pram and starting from the scratch.
If it is based on GNU then there is quite a high wall for initial entry; getting to the point of being effective in making such code modifications both well and appropriately. But again, that only comes from trying to go through gcc (not gas) some years back. I may be wrong on this point, too.
« Last Edit: May 21, 2016, 03:32:31 pm by jonk »
An equal right to an opinion isn't a right to an equal opinion. -- 1995, me
Saying religion is the source of morality is like saying a squirrel is the source of acorns.  -- 2002, me

Disch

  • Hero Member
  • *****
  • Posts: 2814
  • NES Junkie
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #41 on: May 21, 2016, 03:43:44 pm »
I suggest that you read the 65816 specs here ... http://archive.6502.org/datasheets/wdc_w65c816s_aug_4_2008.pdf

Thank you for this.  =)

Quote
Having the assembler automatically take advantage of direct page access is a part of the standard, and was definitely used in the SNASM assembler that I used at the time.
[snip]
From what I can see, CA65 is the current standards-bearer in 65xx assemblers.

CA65 lets you set a zero page segment, and automatically trims 1-byte addresses to use zero/direct page mode, but apart from that does not support automatic direct page detection as far as I can tell.  I can't find any directive that let's you tell it where DP is supposed to be.  Or even DBR for that matter.  And there's no possible way it could do this properly without such a directive.

Ref page:  http://www.cc65.org/doc/ca65-11.html#ss11.1

This is a rather contrived example to keep the concept I'm trying to illustrate simple.... but I've found myself wanting to do things like this quite often.  If you can show me how this can be done on ca65, I would love to see it:

Code: [Select]
bigarray =  $0800       ; assume these are defined externally
snesreg =   $2118

;-------------
sep #$20                ; set data bank & tell assembler where it is
lda #^snesreg
pha
pld
#databank ^snesreg

;-------------
rep #$30
lda #bigarray           ; set direct page and tell assembler where it is
tcd
#directpage  bigarray

;-------------
ldx #0
:   lda bigarray        ; <- I want direct page without having to specify
    sta snesreg         ; <- I want absolute without having to specify
    inx
    inx
    bne :-

I've never seen any assembler capable of doing this.  If I'm wrong, please show me one.


Quote
I understand that you really don't like its linker syntax ... and I agree that it is a bit brutal at first look, but it provides the flexibility to create pretty-much any output ROM layout.

Can you do hot patching with it?  As in, if I want to modify an existing file and not create one from scratch.  Basically what makes xkas so useful.

I'm not dissing on ca65 just because I don't like the linker syntax.  It's a very good assembler and I've used it in the past (I even use it in my FF1 disassembly), but it falls short in more than one way.

Quote
If you find that it's missing features that you feel that you need, then may I suggest that it might be more profitable to the 65xx programming community as a whole to attempt to extend CA65 before throwing all the toys out of the pram and starting from the scratch.

Where and how symbols are resolved, and where and how instruction sizes are determined makes all the difference here.  ca65 does the latter in the assembler, but the former in the linker.

Therefore by the fundamental way it's designed, it can never work the way I want it to because instruction sizes are determined before symbols are fully resolved.  For me to "fix" this in ca65, it'd be a core design change.  I'd basically have to rip the entire thing apart and rebuild it.

It's easier to just start from scratch.

And way more fun.

elmer

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #42 on: May 21, 2016, 04:18:07 pm »
Ignorant of CA65/CC65, the fact that CA65 is and will be designed with the idea of working well with CC65 exposes the possibility of somewhat less than mutual goals in product directions/goals.

It's actually an easier syntax than the old "RS" method, and I suspect that you'd like it.

You basically just define a structure in a similar way to your PSP example, and then refer to structure members with a "::" syntax.

http://cc65.github.io/doc/ca65.html#structs


Quote
If it is based on GNU then there is quite a high wall for initial entry; getting to the point of being effective in making such code modifications both well and appropriately. But again, that only comes from trying to go through gcc (not gas) some years back. I may be wrong on this point, too.

Nope, you're right IMHO ... the GNU codebase is a bit ... yuk! I've updated/added V810 processor support to GCC/binutils, and it wasn't a particularly pleasant experience.

CC65/CA65/LK65 have nothing to do with the GNU codebase.

CC65 is a descendant of Small C, and everything else is custom code that seems to have been put together quite nicely (IMHO).

It certainly didn't take me long to add the square-bracket support to the codebase.


Quote
I believe Disch wants something that a neophyte can start learning to use without having to climb a high learning curve before even getting started. However, I can say that when I looked it over with the idea of simply downloading it and giving it to my son for his current DQ3 SNES work, I knew immediately that I'd have way too much trouble trying to bring him up to the point where he'd use it.

I totally understand that you want something easy-to-use for ROM hacking, and that you've got special considerations in trying to produce something that your son will use.

CA65/LK65 does not support loading/overwriting a ROM image, and that certainly makes it unsuitable for your current needs.

Now, whether you guys choose to try to extend CA65/LK65 to make them easier-to-use, or whether you choose to modify something else, or start from scratch ... the one thing that I'd heartily recommend is that you follow existing standards as much as possible, and not just invent some whole new syntax that's going to make any code that's written in your dialect be a total pain to reuse or to move to other standards-compliant assemblers.


May 21, 2016, 04:41:22 pm - (Auto Merged - Double Posts are not allowed before 7 days.)

This is a rather contrived example to keep the concept I'm trying to illustrate simple.... but I've found myself wanting to do things like this quite often.  If you can show me how this can be done on ca65, I would love to see it:

Code: [Select]
bigarray =  $0800       ; assume these are defined externally
snesreg =   $2118

;-------------
sep #$20                ; set data bank & tell assembler where it is
lda #^snesreg
pha
pld
#databank ^snesreg

;-------------
rep #$30
lda #bigarray           ; set direct page and tell assembler where it is
tcd
#directpage  bigarray

;-------------
ldx #0
:   lda bigarray        ; <- I want direct page without having to specify
    sta snesreg         ; <- I want absolute without having to specify
    inx
    inx
    bne :-

I've never seen any assembler capable of doing this.  If I'm wrong, please show me one.

Thank you for the example.

IIRC SNASM supported something similar ... but I don't have the software/documentation any more in order to verify that, so I can't prove it.

I see what you want ... and it's reasonable, and you're right, CA65 doesn't support it AFAIK.

It wouldn't be hard to add directives to CA65 to do that ... but given CA65/LK65s architecture, the labels would still need to be resolvable at assembly time, and not link time, in just the same way that the zero-page references work.


Quote
Therefore by the fundamental way it's designed, it can never work the way I want it to because instruction sizes are determined before symbols are fully resolved.  For me to "fix" this in ca65, it'd be a core design change.

You're right ... if your design-requirement is that the instruction-size can change during linking, and that's a must-have feature, then CA65 isn't going to do what you want, and you're going to have to write something new.

The cost/benefit/fun calculation is something that only you guys, as the prospective authors, can decide.
« Last Edit: May 21, 2016, 04:43:04 pm by elmer »

jonk

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #43 on: May 21, 2016, 04:48:46 pm »
Nope, you're right IMHO ... the GNU codebase is a bit ... yuk! I've updated/added V810 processor support to GCC/binutils, and it wasn't a particularly pleasant experience.
Hehe. I'm quite experienced (I've done compilers, interpreters, assemblers, and linkers more than once in my life) and in this GNU case found it requiring more time than I had considered acceptable. So I backed off of the idea.

CC65/CA65/LK65 have nothing to do with the GNU codebase.

CC65 is a descendant of Small C, and everything else is custom code that seems to have been put together quite nicely (IMHO).
Ah. That goes back a ways!! Cripes.

I may have rathered lcc, perhaps, since there is a very nice book out on the topic, it's based on newer roots, there is some active support and additional compiler-compiler tools around that work with it, and perhaps even a little more complete.

But thanks for the clue. That helps!

I totally understand that you want something easy-to-use for ROM hacking, and that you've got special considerations in trying to produce something that your son will use.

CA65/LK65 does not support loading/overwriting a ROM image, and that certainly makes it unsuitable for your current needs.

Now, whether you guys choose to try to extend CA65/LK65 to make them easier-to-use, or whether you choose to modify something else, or start from scratch ... the one thing that I'd heartily recommend is that you follow existing standards as much as possible, and not just invent some whole new syntax that's going to make any code that's written in your dialect be a total pain to reuse or to move to other standards-compliant assemblers.
Well, Disch doesn't want or need my help. So I'm in the kibitzing mode! Which I absolutely love! (It's a heck of a lot easier to be a gadfly.  ;)) But yes, where necessary and sufficient and widely used existing standards exist, I'd say it's poor form to at least not know why you aren't following them, if you choose not to. I'll bet Disch follows up.

An equal right to an opinion isn't a right to an equal opinion. -- 1995, me
Saying religion is the source of morality is like saying a squirrel is the source of acorns.  -- 2002, me

elmer

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #44 on: May 22, 2016, 03:06:55 pm »
Ah. That goes back a ways!! Cripes.

I may have rathered lcc, perhaps, since there is a very nice book out on the topic, it's based on newer roots, there is some active support and additional compiler-compiler tools around that work with it, and perhaps even a little more complete.

From what I can see, there aren't many C compilers for the 6502 available, and the ones that I've found are mostly Small-C descendants with (at best) ANSI C syntax added on. As you already know, C as-a-language really doesn't match well with the original 6502 architecture, and AFAIK CC65 and other 6502 C compilers produce pretty poor code.

But some homebrew folks prefer to avoid all-assembly and write very-heavily-massaged C code, with just some optimized assembly functions for the most time-critical parts.

I don't quite understand that, myself, because by the time that you've learned all the compiler-specific rules to optimize the C output, and have totally changed the way that you write your C code, then IMHO you might as well have just written the whole thing in assembly-language using a good library of macros.

****************

As has been pointed out ... CC65 doesn't generate any 65816-specific code AFAIK, it's just outputing source for the original 6502 code.

The advances in the 65816 make it much better suited to C, and WDC will sell (give?) you an ANSI C compiler for the 65816 (and for the 65C02 too!).





jonk

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #45 on: May 22, 2016, 03:34:01 pm »
From what I can see, there aren't many C compilers for the 6502 available, and the ones that I've found are mostly Small-C descendants with (at best) ANSI C syntax added on. As you already know, C as-a-language really doesn't match well with the original 6502 architecture, and AFAIK CC65 and other 6502 C compilers produce pretty poor code.
It's not a good fit, in the sense of having instructions that make writing an effective C compiler 'easier.' But of course, there are a lot of processors that aren't particularly good fits to C (in that sense), such as the Microchip PIC parts which are very 'bare-metal' kinds of processors with all the ugliness exposed into plain view and a tiny hardware stack for return addresses to boot. (Let alone the 8051 core.) Doesn't change the fact that most people writing embedded code for such processors still use C (mixed with some assembly.)

I taught CS courses at the largest 4yr university in Oregon, where C (at that time) was required for freshman and sophmore years. I taught computer architecture, assembly, operating systems, and concurrent programming classes back then. I can assure you that the students generally HATED the assembly class and wanted to get right back into C/C++ as soon as possible and to NEVER look back again on assembly coding. (Except for the 5 or 6 students in a class of 75 that were from the EE department -- those folks had no problems with assembly and wanted it.) So it is no mystery to me about using C, even on the PIC parts.

In response, of course, to lower costs in manufacturing, smaller feature sizes, and improving yields, the manufacturers are of course responding to the reality of the now very large, lower tiers of programming skills that are cheaply and widely available to companies needing programmers. They are providing vast seas of flash memory on their processors and rapidly moving towards 32-bit cores and sophisticated memory management for general purpose operating systems and away from 8-bit cores (it's a little odd that 16-bit was largely skipped over, barring the MSP430, but there are some historical reasons there.)


But some homebrew folks prefer to avoid all-assembly and write very-heavily-massaged C code, with just some optimized assembly functions for the most time-critical parts.
That's not uncommon for embedded work. Companies almost insist on it, fearing "delays" threatened (and touted) by their employees who really aren't skilled at assembly coding, don't want to learn it, and remember their experiences in school with fear and trepidation. In any case, company management is probably wiser in choosing C/C++ w/asm because they can more readily find programmers able to use C/C++ and therefore pay less for them, besides. (Which means they can fire obnoxious asm programmers, too!)


I don't quite understand that, myself, because by the time that you've learned all the compiler-specific rules to optimize the C output, and have totally changed the way that you write your C code, then IMHO you might as well have just written the whole thing in assembly-language using a good library of macros.
That depends on a lot of factors. I tend to agree with the thrust in your writing, because I'm experienced with assembly (40+ years of it) and pretty much can write it as fast as I can think in any other language. But on the other hand, with the growth in availability of 32 bit cores -- ARM based and MIPS based -- and the need for general purpose operating systems (so that 'embedded' programmers don't really need to know anything special at all about embedded work and can just be dropped in from Windows or Linux development) and large codex and video and graphics and other libraries they can tap into without having to pay anything for them or lay hands on them.... they pretty much are forced into C/C++ by the sheer scale of the projects that are now possible with these new systems. So my opinion is a little nuanced here. I'd go different ways depending on the goals.


As has been pointed out ... CC65 doesn't generate any 65816-specific code AFAIK, it's just outputing source for the original 6502 code.
Yeah, someone here mentioned that news to me, earlier, and I retained it. Maybe you?


The advances in the 65816 make it much better suited to C, and WDC will sell (give?) you an ANSI C compiler for the 65816 (and for the 65C02 too!).
I have the WDC toolset loaded into my machine. It doesn't work well. Funny, in that it is dated from 2013, I think, which would seem to mean it is "current," sort of. Their C compiler, called WDC816CC.EXE, pops up an error message telling me "E0002 -- Could not get a license" and then goes on to tell me I don't have a product or group license for it. The assembler, WDC816AS.EXE, seems to work just fine. Their IDE is simply broken. You can't use it, at all. Keeps wanting a "version" to be given to it, but provides no way to do so that I'm aware of. I think this has to do with character set versions, looking around a bit, but I'm not sure. Regardless, I can't get the IDE working much at all. It actually crashes.
« Last Edit: May 22, 2016, 03:45:54 pm by jonk »
An equal right to an opinion isn't a right to an equal opinion. -- 1995, me
Saying religion is the source of morality is like saying a squirrel is the source of acorns.  -- 2002, me

AWJ

  • Full Member
  • ***
  • Posts: 105
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #46 on: May 22, 2016, 03:58:22 pm »
The only difference that I remember in practice, was that a lot of assemblers supported using square-brackets instead of braces for indirection, i.e. ...

Code: [Select]
  lda [$01],y

instead of

Code: [Select]
  lda ($01),y

This was seen as a good-idea by most progammers because it separated the syntax of expression-evaluation from indirection.

On the 65816, lda ($01),y and lda [$01],y are two different addressing modes. The first is indirect (dereferences a 16-bit pointer into the current DB), the second is indirect long (dereferences a 24-bit pointer).

jonk

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #47 on: May 22, 2016, 04:40:22 pm »
On the 65816, lda ($01),y and lda [$01],y are two different addressing modes. The first is indirect (dereferences a 16-bit pointer into the current DB), the second is indirect long (dereferences a 24-bit pointer).
There are all kinds of odd-ball addressing modes provided on the 65816 and the syntax used is kind of 'strained' to fit it.

I don't know how an assembler is supposed to handle these three, if external symbolics are used instead of a constant value. Worse, I'm not sure how an assembly programmer is supposed to be able to force an absolute indexed with X addressing mode if the absolute address is to be $0010. Or even accurately read source code using such symbols, given various assemblers and not knowing what some specific assembler does in each case. But here they are:
Code: [Select]
     LDA $10,X          ; direct page indexed with X
     LDA $1010,X        ; absolute indexed with X
     LDA $101010,X      ; absolute long indexed with X

This following example is inconsistent:
Code: [Select]
     LDA ($10,X)        ; direct page indexed indirect with X
     JMP ($10,X)        ; absolute indexed indirect with X
All the above shows is that there are two different meanings for the same operand syntax, depending upon the opcode. Ugly.

But these do make some sense:
Code: [Select]
     LDA ($10),Y        ; direct page indirect indexed with Y
     LDA [$10],Y        ; direct page indirect long indexed with Y
     LDA ($10)          ; direct page indirect
     LDA [$10]          ; direct page indirect long

Still, I think I'd be open to a clean re-design of the operand syntax for the 65816. And I certainly understand the desire to allow () in expressions without further confusing an assembler tool.
« Last Edit: May 22, 2016, 04:46:48 pm by jonk »
An equal right to an opinion isn't a right to an equal opinion. -- 1995, me
Saying religion is the source of morality is like saying a squirrel is the source of acorns.  -- 2002, me

elmer

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #48 on: May 22, 2016, 05:26:33 pm »
That depends on a lot of factors. I tend to agree with the thrust in your writing, because I'm experienced with assembly (40+ years of it) and pretty much can write it as fast as I can think in any other language. But on the other hand, with the growth in availability of 32 bit cores -- ARM based and MIPS based -- and the need for general purpose operating systems (so that 'embedded' programmers don't really need to know anything special at all about embedded work and can just be dropped in from Windows or Linux development) and large codex and video and graphics and other libraries they can tap into without having to pay anything for them or lay hands on them.... they pretty much are forced into C/C++ by the sheer scale of the projects that are now possible with these new systems. So my opinion is a little nuanced here. I'd go different ways depending on the goals.

Oh, I have no disagreement at all. C/C++ is faster to write in, and I have no complaint at all with people using it on architectures that provide a compatible environment for it (i.e. some 8-bit, and every 16-bit-or-higher architecture that I'm familiar with).

The 6502/65C02 just isn't one of those IMHO, unless you really, really, really tailor/limit your compiler even more than I've seen (so far) in practice.

David Wheeler's page on 6502 language implementation gathers together some interesting ideas on optimizing a 6502 compiler ...
http://www.dwheeler.com/6502/a-lang.txt

When you're already having to severely mangle your C code to make it semi-efficient on the platform, I don't see much downside in putting a few more restrictions on it in order to generate even-better code.

In particular, I love the idea of limiting C's stack to a split lo-byte/hi-byte fixed area of memory (2 256-byte absolute locations), and then automatically turning "large" local variable definitions into automatic heap allocations.


On the 65816, lda ($01),y and lda [$01],y are two different addressing modes. The first is indirect (dereferences a 16-bit pointer into the current DB), the second is indirect long (dereferences a 24-bit pointer).

Yep, my patch to add square-brackets to CA65 specifically documents not to use it in 65816 mode, for exactly that reason (because it hides the indirect long addressing mode from the assembler).

Using the square-brackets was only common on 6502 assemblers.
« Last Edit: May 22, 2016, 05:32:29 pm by elmer »

jonk

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #49 on: May 22, 2016, 05:37:42 pm »
I suggest that you read the 65816 specs here ... http://archive.6502.org/datasheets/wdc_w65c816s_aug_4_2008.pdf
See pages 37-40 for the assembly language standards.
On page 17, section 3.5, it says "Words, arrays, records, or any data structures may span 64 KByte bank boundaries with no compromise in code efficiency."

But then, for example, it says in section 3.5.3, "With Absolute Indexed with X (a,x) addressing the second and third bytes of the instruction are added to the X Index Register to form the low order 16 bits of the effective address. The Data Bank Register contains the high order 8 bits of the effective address." Parsing those words carefully, I'd imagine that a data structure could NOT "span 64 KByte bank boundaries with no compromise in code efficiency" using this mode. Instead, it appears that only a 16-bit ALU add is used and that any carry is tossed away, not added to the temporary DBR value to form up a 24-bit address. However, also in section 3.5.3, they show the following:

Code: [Select]
[  DBR  ][ addrh ][ addrl ]
   +                  X Reg
---------------------------
          effective address
Which suggests to me the possibility of a carry out of addrh affecting the bank value of the effective address.

So, is section 3.5 insane? Or is it section 3.5.3?




May 22, 2016, 05:52:26 pm - (Auto Merged - Double Posts are not allowed before 7 days.)
That depends on a lot of factors. I tend to agree with the thrust in your writing, because I'm experienced with assembly (40+ years of it) and pretty much can write it as fast as I can think in any other language. But on the other hand, with the growth in availability of 32 bit cores -- ARM based and MIPS based -- and the need for general purpose operating systems (so that 'embedded' programmers don't really need to know anything special at all about embedded work and can just be dropped in from Windows or Linux development) and large codex and video and graphics and other libraries they can tap into without having to pay anything for them or lay hands on them.... they pretty much are forced into C/C++ by the sheer scale of the projects that are now possible with these new systems. So my opinion is a little nuanced here. I'd go different ways depending on the goals.
Oh, I have no disagreement at all. C/C++ is faster to write in, and I have no complaint at all with people using it on architectures that provide a compatible environment for it (i.e. some 8-bit, and every 16-bit-or-higher architecture that I'm familiar with).

The 6502/65C02 just isn't one of those IMHO, unless you really, really, really tailor/limit your compiler even more than I've seen (so far) in practice.

David Wheeler's page on 6502 language implementation gathers together some interesting ideas on optimizing a 6502 compiler ...
http://www.dwheeler.com/6502/a-lang.txt

When you're already having to severely mangle your C code to make it semi-efficient on the platform, I don't see much downside in putting a few more restrictions on it in order to generate even-better code.

In particular, I love the idea of limiting C's stack to a split lo-byte/hi-byte fixed area of memory (2 256-byte absolute locations), and then automatically turning "large" local variable definitions into automatic heap allocations.
David's page is really focused on the 6502. I've got my head into the 65816 right now. But I can see why it may be useful, if still more headaches ahead, to keep both contexts in mind. (Well, there is also the somewhat incompatible 65C02, as well. Why not really get a nice headache and include all three fully and completely?)

I'm not entirely sure about your thoughts of using heap. I initially read your words (before seeing the 'automatic heap' part) as meaning that one might do call stack analysis with the idea of placing local variables in static memory (in the sense used when discussing C variable lifetimes) found on bank 0. In this sense, they'd just be static. Not heap allocated. The C compiler would have to trace out the call tree, of course. But that is already done with 8051 C compilers where there is only 128 bytes or 256 bytes within the CPU itself and the technology is understood pretty well. (Recursion, of course, is an issue.) But I suppose you mean that there would be a single heap allocation used to place the static region and that after that occurs, it would be treated as static. Is that about right? Or did I miss something important?

I note that David's page includes a short comment about CC65's -static-locals option. Is it different than what you propose?


On the 65816, lda ($01),y and lda [$01],y are two different addressing modes. The first is indirect (dereferences a 16-bit pointer into the current DB), the second is indirect long (dereferences a 24-bit pointer).
Yep, my patch to add square-brackets to CA65 specifically documents not to use it in 65816 mode, for exactly that reason (because it hides the indirect long addressing mode from the assembler).

Using the square-brackets was only common on 6502 assemblers.
Does CA65 follow the WDC manual on these?
Code: [Select]
     LDA ($10,X)        ; direct page indexed indirect with X
     JMP ($10,X)        ; absolute indexed indirect with X
That really looks inconsistent to me and just begs for a syntax adjustment that clarifies what's going on in the two cases.
« Last Edit: May 22, 2016, 05:55:13 pm by jonk »
An equal right to an opinion isn't a right to an equal opinion. -- 1995, me
Saying religion is the source of morality is like saying a squirrel is the source of acorns.  -- 2002, me

elmer

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #50 on: May 22, 2016, 11:16:45 pm »
There are all kinds of odd-ball addressing modes provided on the 65816 and the syntax used is kind of 'strained' to fit it.

I don't know how an assembler is supposed to handle these three, if external symbolics are used instead of a constant value. Worse, I'm not sure how an assembly programmer is supposed to be able to force an absolute indexed with X addressing mode if the absolute address is to be $0010. Or even accurately read source code using such symbols, given various assemblers and not knowing what some specific assembler does in each case. But here they are:
Code: [Select]
     LDA $10,X          ; direct page indexed with X
     LDA $1010,X        ; absolute indexed with X
     LDA $101010,X      ; absolute long indexed with X

Well, according to my reading of the standards, "LDA unknownvar,X" is going to assemble to a 2-byte address.

If the programmer wants something else, they have to use an override.

To force an absolute indexed addessing mode when the assembler already knows that the high-byte is zero, then you're also going to need an override ...

LDA |$1010,X

So, there are clear and simple rules ... they just don't always look nice, nor are they always obvious.


On page 17, section 3.5, it says "Words, arrays, records, or any data structures may span 64 KByte bank boundaries with no compromise in code efficiency."

But then, for example, it says in section 3.5.3, "With Absolute Indexed with X (a,x) addressing the second and third bytes of the instruction are added to the X Index Register to form the low order 16 bits of the effective address. The Data Bank Register contains the high order 8 bits of the effective address." Parsing those words carefully, I'd imagine that a data structure could NOT "span 64 KByte bank boundaries with no compromise in code efficiency" using this mode. Instead, it appears that only a 16-bit ALU add is used and that any carry is tossed away, not added to the temporary DBR value to form up a 24-bit address. However, also in section 3.5.3, they show the following:

Code: [Select]
[  DBR  ][ addrh ][ addrl ]
   +                  X Reg
---------------------------
          effective address
Which suggests to me the possibility of a carry out of addrh affecting the bank value of the effective address.

So, is section 3.5 insane? Or is it section 3.5.3?

My old data sheet matches your section 3.5.3, and not the marketing-bulls*t in section 3.5.

The illustration is basically the same, but perhaps a tiny bit less ambiguous ...

Code: [Select]
|  DBR  | addrh | addrl |
       +|       | X Reg |
-------------------------
        effective address

So, I'm calling BS on section 3.5.


Quote
David's page is really focused on the 6502. I've got my head into the 65816 right now. But I can see why it may be useful, if still more headaches ahead, to keep both contexts in mind. (Well, there is also the somewhat incompatible 65C02, as well. Why not really get a nice headache and include all three fully and completely?)

Yeah, we're coming at this from different perspectives.

I really don't care about the extra stuff in the 65816. There are already got a couple of 65816 C compilers, like the WDC one ... and I believe that that one is supposed to be descended from whatever compiler was used  (by those few people that used it) on the SNES.

I'm much more interested in the 6502-variants, particularly in the HuC6280 that's in the PC Engine, which is basically a 65C02 with bank mapping and a couple of extra instructions.


Quote
I'm not entirely sure about your thoughts of using heap. I initially read your words (before seeing the 'automatic heap' part) as meaning that one might do call stack analysis with the idea of placing local variables in static memory (in the sense used when discussing C variable lifetimes) found on bank 0. In this sense, they'd just be static. Not heap allocated. The C compiler would have to trace out the call tree, of course.

David Wheeler refers to same idea, and IMHO, that would be the gold-standard to aim for in a 6502 C compiler for performance.

I don't even know if it would be possible to retrofit that into one of the existing 6502 C compilers, or if you'd just be better-off starting from scratch.

Either way ... it's way above my interest-level and capability to work on myself.


Quote
But I suppose you mean that there would be a single heap allocation used to place the static region and that after that occurs, it would be treated as static. Is that about right? Or did I miss something important?

Nope, I'm talking about a quick-and-easy hack to an existing 6502 C compiler in an attempt to improve 80% of the parameter/local stack handling, at the expense of using dynamic heap-allocation for what would normally be stack-based local variables if they are deemed "too big".

The idea being to keep the regular C parameter/local stack addressable as "absaddr,X" instead of "[cstack],Y". It would make access to the C stack a lot quicker, but limited to 256 entries (byte, word or long).

That's "nasty" for a general-purpose compiler, but IMHO, would be acceptible for writing games on a console.


Quote
I note that David's page includes a short comment about CC65's -static-locals option. Is it different than what you propose?

Yes, that option basically just turns every local variable into a "static" variable with no attempt to optimize the usage with a call-tree trace in the way that you were thinking about above.

My cheap-and-nasty proposal is another one of the ideas on David's page, but it keeps the stack concept and tries to improve its performance.

The ideas aren't necessarily exclusive ... I can imagine scenarios when you might want to use both techniques.


Quote
Does CA65 follow the WDC manual on these?
Code: [Select]
     LDA ($10,X)        ; direct page indexed indirect with X
     JMP ($10,X)        ; absolute indexed indirect with X
That really looks inconsistent to me and just begs for a syntax adjustment that clarifies what's going on in the two cases.

Looking at the "ea65.c" source file, it certainly seems to be following the WDC manual.

"Yes", it's inconsistant, and the JMP is nasty anyway because it's a 16-bit absolute index into bank 0 instead of the DBR.

Unless you actually know and understand the weird kinks in the 65816 instruction set, I doubt that you're going to be able to understand the darned code anyway, even with a "[]" or a "{}" or whatever you choose to put in there.

I don't have much love for WDC or for the 65816.

IMHO, the 6809 was a much better example of how to extend/expand an existing architecture (the 6800) than the 65816 was.

Heck, I'd even prefer Intel's expansion of the 8080 into the 8086!

jonk

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #51 on: May 23, 2016, 01:48:49 am »
There are all kinds of odd-ball addressing modes provided on the 65816 and the syntax used is kind of 'strained' to fit it.

I don't know how an assembler is supposed to handle these three, if external symbolics are used instead of a constant value. Worse, I'm not sure how an assembly programmer is supposed to be able to force an absolute indexed with X addressing mode if the absolute address is to be $0010. Or even accurately read source code using such symbols, given various assemblers and not knowing what some specific assembler does in each case. But here they are:
Code: [Select]
     LDA $10,X          ; direct page indexed with X
     LDA $1010,X        ; absolute indexed with X
     LDA $101010,X      ; absolute long indexed with X
Well, according to my reading of the standards, "LDA unknownvar,X" is going to assemble to a 2-byte address.
So if it is an external symbolic, it's going to be treated as a 2-byte absolute indexed with X mode and nothing else?

Arbitrary.

I'd like to see something more like one of these for the direct page mode:
Code: [Select]
     LDA DP:$10,X       ; direct page indexed with X
     LDA DP[$10],X      ; direct page indexed with X
     LDA [DP+$10],X     ; direct page indexed with X
Just something to point out that DP is involved, rather than having absolutely nothing there.

These two I can understand living with, allowing the linker to adjust them as needed and if needed:
Code: [Select]
     LDA $1010,X        ; absolute indexed with X
     LDA $101010,X      ; absolute long indexed with X
They really are similar enough that it doesn't grate on my nerves. But the direct page mode looking so similar, too? That seems almost criminal to me.



If the programmer wants something else, they have to use an override.

To force an absolute indexed addessing mode when the assembler already knows that the high-byte is zero, then you're also going to need an override ...

LDA |$1010,X

So, there are clear and simple rules ... they just don't always look nice, nor are they always obvious.
Okay. I've seen a number of what I consider to be ill-considered parsing short-cuts made in these assemblers. It looks as though they ran their eyes over the keyboard and looked for some as yet unused character they could bend to some novel purpose of the moment. And that's about all the thinking it got, looks like to me. And different people's eyes went to different places on their keyboards, too. Or the same places, but their brains said different things to them for the same cheesy short-cut character. Oh, well.

By comparison, MASM/ML is a dream of craftsmenship.


My old data sheet matches your section 3.5.3, and not the marketing-bulls*t in section 3.5.

The illustration is basically the same, but perhaps a tiny bit less ambiguous ...

Code: [Select]
|  DBR  | addrh | addrl |
       +|       | X Reg |
-------------------------
        effective address

So, I'm calling BS on section 3.5.
Assuming Xreg is set to 16-bit, I gather you also don't believe they are doing a 16-bit + 16-bit ALU add together with a ripple carry into a temporary copy of the DBR. But instead, just doing a 16-bit + 16-bit ALU add, tossing away the carry, and passing on the actual DBR in the uppermost 8-bit lane of the EA. One less clock and a fair bit less logic, that way. However, if I now go look at wdc_65816_manual.pdf (about 470 pages long) and look at page 289, they most definitely say "The Data Bank Register is concatenated with the 16-bit Operand: the 24 bit result is added to X (16 bits if 65802/65816 native mode, x = 0; else 8)." And they show a nice little diagram showing what appears to me to allow beyond-bank addressing. Doesn't seem to make any bones about it -- it should be able to reach past the DBR bank.

Hmm. You need to let me know where to go find some nice simulator source code so I can go see what is done. Do you know of a good place to read, where the code is nicely organized and readable? If not, I suppose I should just go find something myself and load it down. My son is using bsnes-plus, so I suppose that's what I should grab up if you don't have a better way to go, off-hand. Thanks if you do!


Yeah, we're coming at this from different perspectives.

I really don't care about the extra stuff in the 65816. There are already got a couple of 65816 C compilers, like the WDC one ... and I believe that that one is supposed to be descended from whatever compiler was used  (by those few people that used it) on the SNES.

I'm much more interested in the 6502-variants, particularly in the HuC6280 that's in the PC Engine, which is basically a 65C02 with bank mapping and a couple of extra instructions.
Assuming by "more interested in" you mean C compilers:

Well, so far as I can tell I can't actually use the WDC C compiler. Wants a license, it says. Not happening. Besides, doesn't it handle the 6502 and 65C02, already? So that if you are pointing there for "yet another C compiler already there for the 65816" aren't you also pointing there for "yet another C compiler already there for the 6502?"

Is there any more need for a 6502 C compiler than for a 65816 C compiler? Just curious, not offering. I'm not sure I am anywhere near wanting to consider writing a good C compiler project for the 6502 series now. A good one would be more than pedestrian work.

Of course, maybe you were talking about a different interest I didn't catch.


Nope, I'm talking about a quick-and-easy hack to an existing 6502 C compiler in an attempt to improve 80% of the parameter/local stack handling, at the expense of using dynamic heap-allocation for what would normally be stack-based local variables if they are deemed "too big".

The idea being to keep the regular C parameter/local stack addressable as "absaddr,X" instead of "[cstack],Y". It would make access to the C stack a lot quicker, but limited to 256 entries (byte, word or long).

That's "nasty" for a general-purpose compiler, but IMHO, would be acceptible for writing games on a console.
Ah, so you are talking about a C compiler. But just a "quick and easy" hack to one.

I'm just not at all interested in C for the 6502. It's such a small device and I'm perfectly happy with assembler for something like that. And I think you are, too, from some of your earlier comments. Ah, here it is:
Quote from: elmer
I don't quite understand that, myself, because by the time that you've learned all the compiler-specific rules to optimize the C output, and have totally changed the way that you write your C code, then IMHO you might as well have just written the whole thing in assembly-language using a good library of macros.
That one. So you can't really be all that interested in C compilers, then.

Or am I missing something?


Yes, that option basically just turns every local variable into a "static" variable with no attempt to optimize the usage with a call-tree trace in the way that you were thinking about above.

My cheap-and-nasty proposal is another one of the ideas on David's page, but it keeps the stack concept and tries to improve its performance.

The ideas aren't necessarily exclusive ... I can imagine scenarios when you might want to use both techniques.
Agreed. I think I understand better now.


I don't have much love for WDC or for the 65816.
Hehe. Okay. WDC is a different story. But processors are 'just processors.' They don't come ugly or pretty to me. They just do stuff and I adjust the way I think to fit. I can dislike assembler syntax. But that's not about the architecture. I like the PIC10/PIC12/PIC14/PIC16/PIC18, for example, because I can trivially lay out RTL for it almost in my sleep. (Not counting the peripherals, I admit.) All the CPU bits are exposed. Some people hate that. I don't love it. But I understand it, respect it, and don't hate it. Same with most things. I really liked the PDP-11, though. That was a marvelous set of well-considered compromises to use a 16-bit wide instruction space. (Ignoring some of the weird memory mapped hardware bits tacked on.) The 36 bit PDP-10? ASCII was fun there? The 60-bit Cyber Star? Now that was different fun -- to get the CPU working I'd have to adjust the distance from the CPU to memory. hehe. Anyway, they are all good. I don't care that much, so long as good folks worked hard and produced a reasonably complete design.


IMHO, the 6809 was a much better example of how to extend/expand an existing architecture (the 6800) than the 65816 was.

Heck, I'd even prefer Intel's expansion of the 8080 into the 8086!
Intel's 8088 (still have some here) was a really nice innovation for some of us. Working on the 8080 (which was mostly obnoxious because of its need for phased clocks and three darned power rails) and later on the 8085 (much nicer, hardware wise, finally) using 7400 series registers to add banking to the memory system?? Now that was a pain. Switch banks and you had better have code sitting there in the right place in the new bank, because the CPU had no idea what was going on. The 8088 provided a large number of fine-grained overlapping segments, which was MUCH nicer by comparison. HORRIBLE if you wanted to write C compilers and assembler tools. But really nice by comparison with what served beforehand.

The PDP-11 really remains my favorite, though. I've fallen in love with the DEC VAX, the Mot 88k and the TI TMS9900 in their days. I've worked on VLIW. I worked hard and learned to love the MIPS R2000 in its day and I really like the MIPS M4k core used in the PIC32, too. And, of course, worked at Intel on around the time of the BX chipset and the Pentium II/Pentium Pro. (I do like the design of going from segmented memory via GDT/LDT/IDT to the paging system and from there to the physical memory pins. And the front side bus transaction design is nice, too. All that is more honery than pretty, though.)

So the PDP-11 is where my heart stays. The JSR alone was a marvelous idea and no one else does it like that, even to this day. Lots of other good ideas in there, with tight constraints to boot, and all of it done with a sense of the elegant I don't usually see in a CISC CPU design. (I do like the all-out RISC approaches that MIPS took and that DEC did with their Alpha. But the Alpha exception process was an absolute nightmare. They took MIPS' RISC approach and pushed it to the absolute limits and beyond reason. I did like their decision to not support byte lane changes, though. That RISC decision was right. But that exception system? Wow! Insanity for those writing the handlers.)

EDIT: Cripes. I'm too old. I just read back over the list above and worried I should cut the list down. Then I realized that I had been cutting it down and that it was actually a severely curtailed digest. So what's the point cutting more out? There are so many more I've worked on not mentioned above. They go back to mercury delay line memories. Who remembers those? Or the later 1k to 8k drum memories, where a few k-byte would be the size of a washing machine? Anybody living still remember hand-wiring core memory? Oh, well. I better go get my cane and look for some whipper-snapper I can club over the head when they tell me that their terabyte disks and 32-gig RAM systems aren't big enough for them.
« Last Edit: May 23, 2016, 02:11:35 am by jonk »
An equal right to an opinion isn't a right to an equal opinion. -- 1995, me
Saying religion is the source of morality is like saying a squirrel is the source of acorns.  -- 2002, me

AWJ

  • Full Member
  • ***
  • Posts: 105
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #52 on: May 23, 2016, 11:25:57 am »
Indexed addressing definitely can straddle banks on the 65816. The exception is indexed indirect jmp (jmp (jumptable,x)), where the pointer is fetched from the program bank rather than the data bank, and apparently it wraps within that bank (assuming bsnes is correct, and I believe its 65816 simulation has been extensively tested, down to edge cases like BCD arithmetic with illegal BCD values)

jonk

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #53 on: May 23, 2016, 12:13:25 pm »
Indexed addressing definitely can straddle banks on the 65816. The exception is indexed indirect jmp (jmp (jumptable,x)), where the pointer is fetched from the program bank rather than the data bank, and apparently it wraps within that bank (assuming bsnes is correct, and I believe its 65816 simulation has been extensively tested, down to edge cases like BCD arithmetic with illegal BCD values)
Thanks AWJ. It does seem as though this is strictly for the data memory, not code memory. It's just that the documents don't have anywhere near the clarity that I see in the Intel documents or in Microchip documents.. Or for that matter, in pretty much anyone else's documents. Instead, I find all manner of mistakes in the documents from WDC. From simple visual mistakes and absolutely stupid and obvious "copying" mistakes, to more subtle ones. The sheer number of errors is stunning, though. And uncorrected in all these years. Oh, well.

Thanks again!
An equal right to an opinion isn't a right to an equal opinion. -- 1995, me
Saying religion is the source of morality is like saying a squirrel is the source of acorns.  -- 2002, me

elmer

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #54 on: May 23, 2016, 02:19:30 pm »
They really are similar enough that it doesn't grate on my nerves. But the direct page mode looking so similar, too? That seems almost criminal to me.

I wouldn't disagree, I like my code readable and to avoid too many deep-knowledge-required tricks (at least, those without detailed comments).


Quote
Assuming Xreg is set to 16-bit, I gather you also don't believe they are doing a 16-bit + 16-bit ALU add together with a ripple carry into a temporary copy of the DBR. But instead, just doing a 16-bit + 16-bit ALU add, tossing away the carry, and passing on the actual DBR in the uppermost 8-bit lane of the EA. One less clock and a fair bit less logic, that way.

Yep, that's what I was thinking ... and as AWJ pointed out, I was wrong.  :-[

I just took a look at the bsnes source that's embedded in Mednafen, and it's definitely doing a 24-bit add.


Quote
Hmm. You need to let me know where to go find some nice simulator source code so I can go see what is done. Do you know of a good place to read, where the code is nicely organized and readable? If not, I suppose I should just go find something myself and load it down. My son is using bsnes-plus, so I suppose that's what I should grab up if you don't have a better way to go, off-hand. Thanks if you do!

Sorry, I don't know of any simulator except for the one on the WDC site that's built into the WDC Tools.

The bsnes source in Mednafen was easy to read ... but I've never looked at bsnes-plus, so I've no idea if it is any different.


Quote
Assuming by "more interested in" you mean C compilers:

Well, so far as I can tell I can't actually use the WDC C compiler. Wants a license, it says. Not happening. Besides, doesn't it handle the 6502 and 65C02, already? So that if you are pointing there for "yet another C compiler already there for the 65816" aren't you also pointing there for "yet another C compiler already there for the 6502?"

Yes, I was curious to see if the WDC compiler produced any better 6502 code than the open source compilers like CC65 and HuC.

I just downloaded the WDC C compiler, and I'm getting the same license error as you.

Perhaps the license is built into the TIDE editor/environment???

Whether it is or not ... I think that we'd both have the same complaint ... I'd want to choose an open source compiler if I was going to work on any changes/upgrades.

And I'm "more interested in the 6502 variants" both in C terms, and in general terms.


Quote
Is there any more need for a 6502 C compiler than for a 65816 C compiler? Just curious, not offering. I'm not sure I am anywhere near wanting to consider writing a good C compiler project for the 6502 series now. A good one would be more than pedestrian work.

Not really a huge need. AFAIK they're both really only of interest to modern homebrew programmers, and I lean towards the opinion that folks should probably be writing in assembly on those architectures in the same way that the games were originally developed.

IIRC, C wasn't in standard use until the 5th-generation machines (PlayStation/Saturn/3DO).

OTOH ... I am contemplating doing an arcade port to the PC Engine, and having a C compiler that didn't suck could certainly speed up the development of the parts of that port that didn't need to be time-critical.

As such, I'm drawn to the CC65/CA65 toolchain because it makes it so easy to switch between languages in the same project, and it's been put together very well (IMHO).


Quote
Of course, maybe you were talking about a different interest I didn't catch.

Ah, so you are talking about a C compiler. But just a "quick and easy" hack to one.

I'm just not at all interested in C for the 6502. It's such a small device and I'm perfectly happy with assembler for something like that. And I think you are, too, from some of your earlier comments. Ah, here it is:That one. So you can't really be all that interested in C compilers, then.

Or am I missing something?

Nope, you're basically right. I don't have the "passion" for creating a C compiler from scratch ... but I wouldn't object to putting in some work to implement a few simple improvements to one, if they'd help both me, and other developers, on the platforms that I care about.

For me, that's the 6502, and not the 65816.

Hahaha ... I really don't find a 7MHz 65C02-variant with 2.5M RAM (that's bytes, not bits) and a 650MB CD (for streaming code/data/audio) to be a particularly "small" device.  ;)


Quote
Hehe. Okay. WDC is a different story. But processors are 'just processors.' They don't come ugly or pretty to me. They just do stuff and I adjust the way I think to fit.

Ahhh ... I'm a little more critical and less forgiving.

I don't have anywhere near the breadth of experience that you have. It looks like you've been active at just the right time, and in just the right field, to have seen pretty-much the entire history of microprocessor development.

Of the dozen-or-so architectures that I have seen, some stand out as really well thought out, and some don't.

The expansion of the 8080 into the 8088 was a really nice piece of work. Yes, the limitations became apparent, and caused problems later ... but they seemed really well thought out for the situation that existed at the time.

I just took a look at the wikipedia page on the PDP-11, and yes, that JSR is a really nice idea!

The early RISC architectures are definitely interesting, and seemed like a huge change coming from CISC platforms like the 68000.

For some reason, I could never love MIPS architecture, and Hitachi's SuperH (SH-2) is IMHO and abomination, but I recently found the NEC V810 RISC architecture (in the VirtualBoy and the PC-FX), and that seems really well thought out, to me.


Quote
So the PDP-11 is where my heart stays. The JSR alone was a marvelous idea and no one else does it like that, even to this day. Lots of other good ideas in there, with tight constraints to boot, and all of it done with a sense of the elegant I don't usually see in a CISC CPU design.

Hahaha, yep, I just took a look at the NatSemi 32000 series again with the recent release of an updgraded FPGA implemention (the M32632).

It's definitely an interesting classic-CISC processor design, but I don't think that anyone would ever use the word "elegant" in describing it.

jonk

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #55 on: May 23, 2016, 04:18:02 pm »
I just took a look at the bsnes source that's embedded in Mednafen, and it's definitely doing a 24-bit add.
Yeah. This morning I also looked at the bsnes-plus source code, too. It's all in the ../cpu/core directory. Nice.

Sorry, I don't know of any simulator except for the one on the WDC site that's built into the WDC Tools.
The only emulators I recognize as an emulator are built on hardware. An emulator emulates hardware. Period. Everything else is a simulator to me. Microchip makes a simulator that runs in their MPLAB IDE. It simulates the cpu, peripherals, memory, etc. It's still a simulator, not an emulator. I have a Microchip emulator -- it has a nice little pod I can plug into a cpu socket. That's an emulator. When I worked at Intel, we had a huge box of FPGAs and out of that mess there was a nice cable and ... yup ... a pod that could fit into the CPU socket of a motherboard. And guess what? It ran the motherboard from the RTL (VHDL or Verilog plus floorplanning) from that huge FPGA box. That's also an emulator.

If it doesn't emulate a piece of hardware I can drop into other hardware, it's just a sim to me.

Someday I might change. But 40 years is a hard habit to break.

I just downloaded the WDC C compiler, and I'm getting the same license error as you.

Perhaps the license is built into the TIDE editor/environment???

Whether it is or not ... I think that we'd both have the same complaint ... I'd want to choose an open source compiler if I was going to work on any changes/upgrades.
Yup. So I'm writing off the WDC C compiler. I'm sure it would be a nightmare trying to get ahold of them, anyway. I'm sure they are at minimum staff, just barely enough to manage their IP and count the dollars coming in, I'm sure. I don't think they are pro-active anymore. Or if they ever were. I think they just wait around, instead, and pocket the IP bucks.

And I'm "more interested in the 6502 variants" both in C terms, and in general terms.
The only experience I have, commercially, with 6502 variants is with the Seiko message watch some years ago. But this makes my point here. I don't know of anyone using a general-purpose, end-user quantity situation with the 6502. Is anyone doing homebrew 6502 boards anymore? I might have a few CPUs here, still. In a box somewhere. But I will probably never build anything with them. Does anyone? Are they still sold?

Seems to me it is more of a "rice cooker" style, one million unit order size kind of thing. That fits my Seiko experience, where they needed some very specific features added. You either hire your own ASIC designer and WDC provides you with the basics to work with, licensing your rights; or else you hire WDC and let them contract that out for you.

But I thought the 6502 was otherwise kind of dead to the hobby world. Just folks who can buy an old Apple II, an SNES, or a NES, or something like that.

Not much to hang a hat on, if considering C compiler efforts.

AFAIK they're both really only of interest to modern homebrew programmers, and I lean towards the opinion that folks should probably be writing in assembly on those architectures in the same way that the games were originally developed.
Agreed, of course. Except that I'm really curious if there are ANY homebrewers. Who wire-wraps anymore? It's not hard to get a board built, but you have to do layout and order a panel's worth. Could do it 'dead bug' I suppose. But is anyone doing 6502 homebrew?

IIRC, C wasn't in standard use until the 5th-generation machines (PlayStation/Saturn/3DO).
Your memory would be better than mine on this. I don't have a reason to dispute your comment, though. And it is consistent with what I think I know.

OTOH ... I am contemplating doing an arcade port to the PC Engine, and having a C compiler that didn't suck could certainly speed up the development of the parts of that port that didn't need to be time-critical.

As such, I'm drawn to the CC65/CA65 toolchain because it makes it so easy to switch between languages in the same project, and it's been put together very well (IMHO).
Now that's intriguing. What is the PC Engine? I'd like to see a description of it. Thanks...

I don't have the "passion" for creating a C compiler from scratch ... but I wouldn't object to putting in some work to implement a few simple improvements to one, if they'd help both me, and other developers, on the platforms that I care about.
I sometimes wonder which is easier, modifying someone else's mess or writing my own. Data structure design is so crucial to helping simplify the resulting code and make it more robust to future change. Given that I have a small bit of experience with parsing and compilers, it's often easier for me to craft one than to wade through bad initial design decisions and later, ugly, horrible grafting work to hack in functionality they should have considered before starting out.

But I get your point, too. You want to choose the least-time path, whatever that looks like. It's just that different people will see that least-time path differently, I suppose.

Hahaha ... I really don't find a 7MHz 65C02-variant with 2.5M RAM (that's bytes, not bits) and a 650MB CD (for streaming code/data/audio) to be a particularly "small" device.  ;)
Well, 2.5Mb RAM and a CD means the system isn't "small." But the CPU is still small. It's a 1975 device and is probably some 3-4 thousand equivalent transistors (in CMOS it's all inverters and transmission gates, really, but who's counting?) If you made that in a current tech Intel FAB it would probably (not counting pad outs and external drivers) work out to 1 micron by 1 micron in size! Smaller than a lot of bacteria. Too small to see by eye, even on an absolutely clean and polished silicon wafer.

It's so close to zero, you couldn't detect the difference.

7MHz? Cripes. That thing could be running at GHz if some actually used a modern FAB on it. At 7MHz, it's probably being built in someone's clay oven, next to some pottery they are also making. The masks are probably hand-painted on the surface, etching done in a wash basin, and a polishing step with scotch-brite pad. ;) (I've built small demo FABs in my garage using a nickel plated chamber and water cooling, by the way.)

Ahhh ... I'm a little more critical and less forgiving.

Of the dozen-or-so architectures that I have seen, some stand out as really well thought out, and some don't.

The expansion of the 8080 into the 8088 was a really nice piece of work. Yes, the limitations became apparent, and caused problems later ... but they seemed really well thought out for the situation that existed at the time.
I like any decent design that I learn from. Most designs have thousands of constraints imposed on them and I'm often just impressed when I see how well the engineers navigated through them. It's often pretty remarkable. In most cases, I learn something new, too.

I just took a look at the wikipedia page on the PDP-11, and yes, that JSR is a really nice idea!
It's so useful -- especially for co-routines. Nothing like it in anything today. Too bad that state of the art wasn't remembered and/or retained in at least some newer designs.

The early RISC architectures are definitely interesting, and seemed like a huge change coming from CISC platforms like the 68000.

For some reason, I could never love MIPS architecture,
I remember flying in to MIPS and seeing Dr. Hennessey there. He had a huge mural behind glass of the 68020 processor's die. And he'd start there, describing how much of that die was "wasted" by sequencing logic and control store. About 70% as I recall. The rest, he'd say, was functional units and registers. But the 70% did nothing itself to add processing power. It was there just to make the instruction set "nice."

Motorola and Intel would be hide-bound before they'd sell any of their fancy FAB capacity to a competitor. (Their FABs were the most advanced and the most expensive.) MIPS could only buy "hand-me-down" FAB access, which meant roughly 150k transistor equivs when Intel and Motorola were fielding 4 million+ dies. So MIPS had to do, with 150k, what Intel and Mot were doing with millions. And MIPS DID! It was amazing to see.

MIPS stripped everything down. They had to, of course. They went straight to high clock rates, which meant high quality caching of memory, separation of instruction and data caches, the shortest possible combinatorial logic chains, and as much pipelining as possible. They didn't want to take a hit for a branch, either. Normally, the memory system is feeding the IR (instruction register) in a separate pipe. It doesn't know anything about branches. It just loads, loads, loads, etc. When a branch takes place, the IR is already loaded with the instruction after the branch and is already decoded. So now what? Toss it away and force a pipeline stall to wait while the target is re-loaded? No way. So MIPS said, nope. We execute that instruction regardless. You don't like that? Stick a NOP there. Hard luck. Besides, adding logic to handle the stall would insert something into the critical path and lengthen the clock rate. Sorry. Not happening. Same thing with register interlocks. Register reads heading to the ALU are done in parallel with register writes. If you write a register in a prior instruction, that write will actually still be in the pipeline when the next instruction reads the register. Normal folks add 'interlocks' and use these to stall the system to allow the write to occur (if they pipeline, at all.) Not MIPS. You get the old value, not the newly written one. Don't like that? Insert a NOP or find something else that is useful to do. The interlock adds logic, lengths the combinatorial logic chain, and therefore may lengthen the cycle time.

Lots of decisions like that. They were working with poor-man's FABs and had to compete with what they could access. And they did such a good job of it that it forced Intel to build RISC into their x86 family. (I was there, saw it happen. Intel was scared!) You have to respect what they achieved and how they achieved it.

and Hitachi's SuperH (SH-2) is IMHO and abomination, but I recently found the NEC V810 RISC architecture (in the VirtualBoy and the PC-FX), and that seems really well thought out, to me.
The Hitachi H8 was pretty interesting. I did do some work designing boards with it and programming it. They had a nice idea, too, of making it "EPROM" compatible so that you could drop the chips into a standard EPROM programmer. And the instruction set was "pretty."

Hahaha, yep, I just took a look at the NatSemi 32000 series again with the recent release of an upgraded[sp] FPGA implementation[sp] (the M32632).

It's definitely an interesting classic-CISC processor design, but I don't think that anyone would ever use the word "elegant" in describing it.
I'd forgotten the 32k. Now that was a odd lot of stuff. The timing controller... TCU... wow. I remember spending time studying that one.
« Last Edit: May 23, 2016, 06:09:38 pm by jonk »
An equal right to an opinion isn't a right to an equal opinion. -- 1995, me
Saying religion is the source of morality is like saying a squirrel is the source of acorns.  -- 2002, me

STARWIN

  • Sr. Member
  • ****
  • Posts: 456
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #56 on: May 23, 2016, 06:11:46 pm »
The only emulators I recognize as an emulator are built on hardware.

I think the interpretation that has managed to establish itself makes more sense than yours. Hardware is just software with errors. If someone gives you a black box which emulates something in a certain configuration, by your definition you can't be certain whether it is an emulator or a simulator.

Regarding MIPS, for some tiny hacks it can be convenient to have a NOP around, and with fixed size instructions too. I wish there was a NOP after each instruction!

jonk

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #57 on: May 23, 2016, 06:24:27 pm »
I think the interpretation that has managed to establish itself makes more sense than yours. Hardware is just software with errors. If someone gives you a black box which emulates something in a certain configuration, by your definition you can't be certain whether it is an emulator or a simulator.
I have 40 years of shared terminology usage from long experience with other designers and it's a habit I'm not breaking. It would require 'convoluted' thinking on my part and that leads to making internal mental mistakes. So I'm not debating this. I'm just sharing my meaning with others, in case it helps them understand my wording better. Elmer misunderstood what I was saying, so I explained myself further. I've no interest debating or arguing or trying to change your mind or anyone else's about your use of terms. I'm just not changing mine, either. Not yet, anyway.

Regarding MIPS, for some tiny hacks it can be convenient to have a NOP around, and with fixed size instructions too. I wish there was a NOP after each instruction!
The assembler did some nice "back-filling" work with their re-organizer. So it "helped" you out, if you wanted the help. I've seen NOPs used profusely. But I never bothered. It was a very simple to understand processor. By comparison, I did more than a decade of programming on the ADSP-21xx DSP processor family. You could do a read, an ALU op, and a write all in the exact same, one cycle (50ns or so) instruction. I had a 1024-point complex input, complex output FFT that ran in well under 3ms on that beast. And it didn't support FP in the chip, by the way, so the FP was software with some barrel shifter hardware support. If you work on a processor like the ADSP-21xx for much time, you get really really good at handling parallel pilelines and back-filling code all over the place to maximize speed and minimize code and data size. MIPS R2000 assembly was dead easy by comparison.
An equal right to an opinion isn't a right to an equal opinion. -- 1995, me
Saying religion is the source of morality is like saying a squirrel is the source of acorns.  -- 2002, me

elmer

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #58 on: May 23, 2016, 06:31:05 pm »
If it doesn't emulate a piece of hardware I can drop into other hardware, it's just a sim to me.

Hahaha, yep, we come from different job domains. I've only seen a couple of classic hardware-emulators for the CPU used in development over the years.

In my job we usually had much-simpler hardware-emulators for the system's ROM.


Quote
But this makes my point here. I don't know of anyone using a general-purpose, end-user quantity situation with the 6502. Is anyone doing homebrew 6502 boards anymore? I might have a few CPUs here, still. In a box somewhere. But I will probably never build anything with them. Does anyone? Are they still sold?

Ah ... our differences are showing again.

I've seen very little homebrew hardware over the years since the end of the 70s and the Altair and other card-based CP/M machines.

The last one that I saw was some Brazilian guy that was creating an MSX-clone machine from original Z80 parts (plus some modern logic).

"Homebrew" in the games and game-hacking world is just software-homebrew.

That's people writing complete new games/utilities for old games consoles/computers, rather than hacking an existing ROM to create a translation or to modify it.

And "yes", there are definitely quite a few folks doing that for various old machines that they love.

This just isn't really the forum where those kind of folks hang out.

From what I've seen, you're more likely to see them where the game players for their specific beloved-console hang out.

If you want to see people still using the 6502, you can look at places nesdev.com or atariage.com, or, in my case, pcenginefx.com


Quote
Now that's intriguing. What is the PC Engine? I'd like to see a description of it.

It's a game console that came out in 1987, the 1st of the 4th-generation machines. It was the first game console with a CD-ROM drive available for it.

Over here in America it was called the Turbo Grafx ... and it failed pretty badly for a variety of reasons.

But in Japan, it knocked the NES off the top spot for a time, until the SNES came out, and it outsold the Sega MegaDrive/Genesis.

Not much respect for it here in America by the average gamer, but well-loved by the people that know it and play the games.

Ground-breaking at the time for having excellent RPG games with CD soundtracks and voice acting years before anyone else.


Quote
I sometimes wonder which is easier, modifying someone else's mess or writing my own. Data structure design is so crucial to helping simplify the resulting code and make it more robust to future change. Given that I have a small bit of experience with parsing and compilers, it's often easier for me to craft one than to wade through bad initial design decisions and later, ugly, horrible grafting work to hack in functionality they should have considered before starting out.

But I get your point, too. You want to choose the least-time path, whatever that looks like. It's just that different people will see that least-time path differently, I suppose.

Yep, that's exactly it. I don't have the background to quickly knock-up my own ANSI-C compatible compiler, so the quick alternative is to either improve someone else's, or just live with a good macro-assembler (which wouldn't bother me at all).


Quote
Motorola and Intel would be hide-bound before they'd sell any of their fancy FAB capacity to a competitor. (Their FABs were the most advanced and the most expensive.) MIPS could only by "hand-me-down" FAB access, which meant roughly 150k transistor equivs when Intel and Motorola were fielding 4 million+ dies. So MIPS had to do, with 150k, what Intel and Mot were doing with millions. And MIPS DID! It was amazing to see.

Thanks for the background, that really does help put it all into perspective.

I guess that it must have been amazing to see from a front-row position.

But as an assembly-language programmer, all those nop-or-else delay rules were a royal PITA. Sure, the compiler could handle them easily, but compilers were pretty lousy in those days.

And if we're going to go with the perspective of the-little-engine-that-could, then I'd probably bring up the ARM processor as the product of a small company that managed to design a really powerful and pleasant RISC-like CPU.

BTW ... going back to the PDP-11's JSR instruction. Isn't that basically the CISC version of having the return address put into a link-register?

The NEC V850 (still in production by Reneas) lets you use an register to hold the return address with it's JAL instruction.

jonk

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
Re: 65816: Direct Page vs Absolute Operand
« Reply #59 on: May 23, 2016, 07:24:02 pm »
I've seen very little homebrew hardware over the years since the end of the 70s and the Altair and other card-based CP/M machines.
That's too bad. I still do a lot of it. I wire-wrap, dead-bug, modify something I can buy cheap, or I go buy a panel and solder one up. Once the skills are there, you use them I suppose.

"Homebrew" in the games and game-hacking world is just software-homebrew.

That's people writing complete new games/utilities for old games consoles/computers, rather than hacking an existing ROM to create a translation or to modify it.

And "yes", there are definitely quite a few folks doing that for various old machines that they love.
Okay. That's a small stretch to wrap my mind around. But I can do it. I just need to figure out what someone (you) means when the word is written out, I suppose. Context. I'll struggle and remember that in this site 'homebrew' never means hardware and always means writing complete games instead of modifying old ones.

If you want to see people still using the 6502, you can look at places nesdev.com or atariage.com, or, in my case, pcenginefx.com
I'll give those a look.

It's a game console that came out in 1987, the 1st of the 4th-generation machines. It was the first game console with a CD-ROM drive available for it.

Over here in America it was called the Turbo Grafx ... and it failed pretty badly for a variety of reasons.

But in Japan, it knocked the NES off the top spot for a time, until the SNES came out, and it outsold the Sega MegaDrive/Genesis.

Not much respect for it here in America by the average gamer, but well-loved by the people that know it and play the games.

Ground-breaking at the time for having excellent RPG games with CD soundtracks and voice acting years before anyone else.
Okay. So I may someday think about it when I find a circumstance providing motivation.

Yep, that's exactly it. I don't have the background to quickly knock-up my own ANSI-C compatible compiler, so the quick alternative is to either improve someone else's, or just live with a good macro-assembler (which wouldn't bother me at all).
They aren't all that hard. Most of the work goes into all the target-specific stuff and the bells and whistles everyone demands (like colored keywords in their editor, and mostly useless cr*p like that.) Some goes into serious, useful optimizations, though. The parsing stuff and, if you don't care about optimization at all, the code generation stuff is pretty darned easy. Boilerplate.

Thanks for the background, that really does help put it all into perspective.

I guess that it must have been amazing to see from a front-row position.
Intel was VERY smart. (Is, I suppose, though I think NVIDIA is giving them some heartburn in some ways today.) Their processors became their largest profit center in 1985. Around that time, MIPS was showing up on the scene. The 80386 used the 1960's Multics address translation scheme, almost verbatim, and was very well designed a long time before Intel. But it was slow on the 80386.

Somewhat prior, step 1 in this story, folks had learned how to finesse the original IBM BIOS and to provide (except for the ROM BASIC) a near complete functionality (99.9% compatible) of the IBM with the Kaypro 286i. (The very first machine to get it almost entirely right.) The competition was at first simply about clock rates, as Intel slowly increased the speed of their 80286 devices upwards from around 6MHz towards 10, 12, and 16MHz. But there was a huge barrier -- the bus was synch'd to the cpu, so as the CPU rates could go up the bus rate also had to go up with it. But that "broke" the boards plugged into the system, as most of them couldn't go any faster than about 8.5MHz or so. (I know, I screwed around with that for a while.) So manufacturers needed to decouple the bus rate from the cpu rate. Step 2 in the story is about figuring how to do that. It took a boat-load of 7400 series chips and the next spate of computer motherboards were a veritable sea of socketed 7400 parts. It was something to behold. But it worked. That was about the time that Chips&Tech formed, to help solve this problem with ASICs. C&T make chipsets which provided all the of decoupling required without all those individual logic chips. This greatly simplified design for motherboard manufacturers and there was another spurt of growth. Step 3 was the period when C&T really grew and made buckets of money that Intel wanted back. That will get us to step 5. But also in step 3 was the moment when Intel 'discovered' that selling ICs to customers at K-mart was a LOT more profitable than selling them in thousands to some honery engineers. Intel had gotten manufacturers to include co-processor chip sockets for the 80286 and 80386. But it was only around the time when the 80386 was out that they really started to sell through at places like K-mart. And the profits were like a drug -- Intel couldn't get enough of that. Now step 4. Intel was loving the "sell ICs to stupid end-users for stupid money" mode and really came up with some serious scams with the 80486 family. They were still griped about C&T, too. But right now they were dialed into the idea of selling ICs, one at a time, to end-users who couldn't open their wallets fast enough. So they came up with the 80486SX and 80486DX. The 80486 yield wasn't perfect. Some of them didn't have decent FP. So they worked out how to turn that part off, if it didn't 'yield' and to sell bad chips by the bucket. They encouraged motherboard makers to add a 80487SX socket to their motherboards, offered to sell 80486SX chips "on the cheap" to further encourage them, and really tried to make it harder for motherboard makers to consider a full-up 80486DX board. At the beginning of this cycle, the 80486DX boards were all you could get. But by the end, they were nearly unobtainium because Intel was so good at this. The end user, of course, would eventually "figure out" that they were sold an 80486SX and that they really wanted the floating point added in and were willing to go buy a chip. The funny thing? The 80487SX was just a rebonded 80486DX chip. In fact, all of the 80486 line was the SAME darned die. 80486DX is a fully tested die. 80486SX is a bad-FP-test die toggled to inhibit its FP unit. 80487SX was just a rewired, good-tested 80486DX that lifted the 80486SX off the bus and just took over. Basically, Intel got people to buy dog poop to start and then sold them the real deal over the K-mart counter, one at a time, later. They bypassed those pesky engineers, too. Meanwhile, Intel was busy with step 5, solving another problem: C&T. With the 80486, they started themselves into the C&T chipset business. It wouldn't happen all at once, of course, but gradually during step 6, they pretty much took over the chipset market. Step 6 was about getting rid of all those thousands of mom and pop motherboard manufacturers. So many of them meant lots of retail competition. That meant low manufacturing costs. Which meant low chip prices to Intel. Intel considered that a bad thing. So they needed to greatly reduce the motherboard manufacturers. The idea came with the PCI bus -- sold as a "green" bus idea (it's reflection wave, instead of incident wave, and that actually is lower power.) It's real intent was to elevate the cost of equipment needed to make motherboards and add-in cards, though. An oscilloscope for a regular ISA bus would set you back a couple of grand. But for the PCI bus? You'd need to start at $100k each. And work up from there. It is a very expensive bus to design for. And that worked. It literally killed the mom-and-pop businesses. And reduced the manufacturer count dramatically, leaving only very well funded businesses in that market. This allowed chip prices to rise, etc. Another problem solved. (They still will tell you it was all about being green -- and that is the part-truth that makes the lie so much better.) Then there were more steps (graphics, etc.) But that gives a thumbnail. Intel uncovered each challenge to profit and solved it. They made some mistakes along the way (one of them cost $250 million in a single quarter buying back their own memory.) But they generally did smart things to muscle themselves into existing profitable 'claims' after others worked hard to prove out those claims and demonstrate value there. Intel would then march in and suck it up with a strategy in mind. 

But as an assembly-language programmer, all those nop-or-else delay rules were a royal PITA. Sure, the compiler could handle them easily, but compilers were pretty lousy in those days.

And if we're going to go with the perspective of the-little-engine-that-could, then I'd probably bring up the ARM processor as the product of a small company that managed to design a really powerful and pleasant RISC-like CPU.
ARM is a fantastic success story!! It never was able to perform like some of the best RISC cases (power, speed, etc.) But it performed 'well-enough.' And their strategies were excellent. (Hmm. Reminds me now of SPARC, which also competed for a while.) Today, there is little like them. If you want multiple source CPUs, it is either x86 or else ARM. Everything else is single-source. Or close to it. The compilers for the x86 and ARM are as good as they get, too. No place better to go. I love ARM and have lots of development tools here for them, as well. (In circuit JTAG stuff, for example.)

BTW ... going back to the PDP-11's JSR instruction. Isn't that basically the CISC version of having the return address put into a link-register?

The NEC V850 (still in production by Reneas) lets you use an register to hold the return address with it's JAL instruction.
Hmm. I admit I stepped by the V850. Probably the only processor I haven't touched! hehe. I'll have to go look.

JSR on the PDP-11, together with its very orthogonal addressing modes, provides more than just one or two completely separate concepts for subroutines, tasks, co-routines, and threading. I'll have to look at the V850 case to see if it can touch all of them. Might be. I wouldn't know.
« Last Edit: May 23, 2016, 10:08:04 pm by jonk »
An equal right to an opinion isn't a right to an equal opinion. -- 1995, me
Saying religion is the source of morality is like saying a squirrel is the source of acorns.  -- 2002, me