I just took a look at the bsnes source that's embedded in Mednafen, and it's definitely doing a 24-bit add.
Yeah. This morning I also looked at the bsnes-plus source code, too. It's all in the ../cpu/core directory. Nice.
Sorry, I don't know of any simulator except for the one on the WDC site that's built into the WDC Tools.
The only emulators I recognize as an emulator are built on hardware. An emulator emulates hardware. Period. Everything else is a simulator to me. Microchip makes a simulator that runs in their MPLAB IDE. It simulates the cpu, peripherals, memory, etc. It's still a simulator, not an emulator. I have a Microchip emulator -- it has a nice little pod I can plug into a cpu socket. That's an emulator. When I worked at Intel, we had a huge box of FPGAs and out of that mess there was a nice cable and ... yup ... a pod that could fit into the CPU socket of a motherboard. And guess what? It ran the motherboard from the RTL (VHDL or Verilog plus floorplanning) from that huge FPGA box. That's also an emulator.
If it doesn't emulate a piece of hardware I can drop into other hardware, it's just a sim to me.
Someday I might change. But 40 years is a hard habit to break.
I just downloaded the WDC C compiler, and I'm getting the same license error as you.
Perhaps the license is built into the TIDE editor/environment???
Whether it is or not ... I think that we'd both have the same complaint ... I'd want to choose an open source compiler if I was going to work on any changes/upgrades.
Yup. So I'm writing off the WDC C compiler. I'm sure it would be a nightmare trying to get ahold of them, anyway. I'm sure they are at minimum staff, just barely enough to manage their IP and count the dollars coming in, I'm sure. I don't think they are pro-active anymore. Or if they ever were. I think they just wait around, instead, and pocket the IP bucks.
And I'm "more interested in the 6502 variants" both in C terms, and in general terms.
The only experience I have, commercially, with 6502 variants is with the Seiko message watch some years ago. But this makes my point here. I don't know of anyone using a general-purpose, end-user quantity situation with the 6502. Is anyone doing homebrew 6502 boards anymore? I might have a few CPUs here, still. In a box somewhere. But I will probably never build anything with them. Does anyone? Are they still sold?
Seems to me it is more of a "rice cooker" style, one million unit order size kind of thing. That fits my Seiko experience, where they needed some very specific features added. You either hire your own ASIC designer and WDC provides you with the basics to work with, licensing your rights; or else you hire WDC and let them contract that out for you.
But I thought the 6502 was otherwise kind of dead to the hobby world. Just folks who can buy an old Apple II, an SNES, or a NES, or something like that.
Not much to hang a hat on, if considering C compiler efforts.
AFAIK they're both really only of interest to modern homebrew programmers, and I lean towards the opinion that folks should probably be writing in assembly on those architectures in the same way that the games were originally developed.
Agreed, of course. Except that I'm really curious if there are
ANY homebrewers. Who wire-wraps anymore? It's not hard to get a board built, but you have to do layout and order a panel's worth. Could do it 'dead bug' I suppose. But is anyone doing 6502 homebrew?
IIRC, C wasn't in standard use until the 5th-generation machines (PlayStation/Saturn/3DO).
Your memory would be better than mine on this. I don't have a reason to dispute your comment, though. And it is consistent with what I think I know.
OTOH ... I am contemplating doing an arcade port to the PC Engine, and having a C compiler that didn't suck could certainly speed up the development of the parts of that port that didn't need to be time-critical.
As such, I'm drawn to the CC65/CA65 toolchain because it makes it so easy to switch between languages in the same project, and it's been put together very well (IMHO).
Now that's intriguing. What is the PC Engine? I'd like to see a description of it. Thanks...
I don't have the "passion" for creating a C compiler from scratch ... but I wouldn't object to putting in some work to implement a few simple improvements to one, if they'd help both me, and other developers, on the platforms that I care about.
I sometimes wonder which is easier, modifying someone else's mess or writing my own. Data structure design is so crucial to helping simplify the resulting code and make it more robust to future change. Given that I have a small bit of experience with parsing and compilers, it's often easier for me to craft one than to wade through bad initial design decisions and later, ugly, horrible grafting work to hack in functionality they should have considered before starting out.
But I get your point, too. You want to choose the least-time path, whatever that looks like. It's just that different people will see that least-time path differently, I suppose.
Hahaha ... I really don't find a 7MHz 65C02-variant with 2.5M RAM (that's bytes, not bits) and a 650MB CD (for streaming code/data/audio) to be a particularly "small" device. 
Well, 2.5Mb RAM and a CD means the
system isn't "small." But the CPU is still small. It's a 1975 device and is probably some 3-4 thousand equivalent transistors (in CMOS it's all inverters and transmission gates, really, but who's counting?) If you made that in a current tech Intel FAB it would probably (not counting pad outs and external drivers) work out to 1 micron by 1 micron in size! Smaller than a lot of bacteria. Too small to see by eye, even on an absolutely clean and polished silicon wafer.
It's so close to zero, you couldn't detect the difference.
7MHz? Cripes. That thing could be running at GHz if some actually used a modern FAB on it. At 7MHz, it's probably being built in someone's clay oven, next to some pottery they are also making. The masks are probably hand-painted on the surface, etching done in a wash basin, and a polishing step with scotch-brite pad.

(I've built small demo FABs in my garage using a nickel plated chamber and water cooling, by the way.)
Ahhh ... I'm a little more critical and less forgiving.
Of the dozen-or-so architectures that I have seen, some stand out as really well thought out, and some don't.
The expansion of the 8080 into the 8088 was a really nice piece of work. Yes, the limitations became apparent, and caused problems later ... but they seemed really well thought out for the situation that existed at the time.
I like any decent design that I learn from. Most designs have thousands of constraints imposed on them and I'm often just
impressed when I see how well the engineers navigated through them. It's often pretty remarkable. In most cases, I learn something new, too.
I just took a look at the wikipedia page on the PDP-11, and yes, that JSR is a really nice idea!
It's so useful -- especially for co-routines. Nothing like it in anything today. Too bad that state of the art wasn't remembered and/or retained in at least some newer designs.
The early RISC architectures are definitely interesting, and seemed like a huge change coming from CISC platforms like the 68000.
For some reason, I could never love MIPS architecture,
I remember flying in to MIPS and seeing Dr. Hennessey there. He had a huge mural behind glass of the 68020 processor's die. And he'd start there, describing how much of that die was "wasted" by sequencing logic and control store. About 70% as I recall. The rest, he'd say, was functional units and registers. But the 70% did nothing itself to add processing power. It was there just to make the instruction set "nice."
Motorola and Intel would be hide-bound before they'd sell any of their fancy FAB capacity to a competitor. (Their FABs were the most advanced and the most expensive.) MIPS could only buy "hand-me-down" FAB access, which meant roughly 150k transistor equivs when Intel and Motorola were fielding 4 million+ dies. So MIPS had to do, with 150k, what Intel and Mot were doing with millions. And MIPS DID! It was amazing to see.
MIPS stripped everything down. They had to, of course. They went straight to high clock rates, which meant high quality caching of memory, separation of instruction and data caches, the shortest possible combinatorial logic chains, and as much pipelining as possible. They didn't want to take a hit for a branch, either. Normally, the memory system is feeding the IR (instruction register) in a separate pipe. It doesn't know anything about branches. It just loads, loads, loads, etc. When a branch takes place, the IR is already loaded with the instruction after the branch and is already decoded. So now what? Toss it away and force a pipeline stall to wait while the target is re-loaded? No way. So MIPS said, nope. We execute that instruction regardless. You don't like that? Stick a NOP there. Hard luck. Besides, adding logic to handle the stall would insert something into the critical path and lengthen the clock rate. Sorry. Not happening. Same thing with register interlocks. Register reads heading to the ALU are done in parallel with register writes. If you write a register in a prior instruction, that write will actually still be in the pipeline when the next instruction reads the register. Normal folks add 'interlocks' and use these to stall the system to allow the write to occur (if they pipeline, at all.) Not MIPS. You get the old value, not the newly written one. Don't like that? Insert a NOP or find something else that is useful to do. The interlock adds logic, lengths the combinatorial logic chain, and therefore may lengthen the cycle time.
Lots of decisions like that. They were working with poor-man's FABs and had to compete with what they could access. And they did such a good job of it that it forced Intel to build RISC into their x86 family. (I was there, saw it happen. Intel was scared!) You have to respect what they achieved and how they achieved it.
and Hitachi's SuperH (SH-2) is IMHO and abomination, but I recently found the NEC V810 RISC architecture (in the VirtualBoy and the PC-FX), and that seems really well thought out, to me.
The Hitachi H8 was pretty interesting. I did do some work designing boards with it and programming it. They had a nice idea, too, of making it "EPROM" compatible so that you could drop the chips into a standard EPROM programmer. And the instruction set was "pretty."
Hahaha, yep, I just took a look at the NatSemi 32000 series again with the recent release of an upgraded[sp] FPGA implementation[sp] (the M32632).
It's definitely an interesting classic-CISC processor design, but I don't think that anyone would ever use the word "elegant" in describing it.
I'd forgotten the 32k. Now that was a odd lot of stuff. The timing controller... TCU... wow. I remember spending time studying that one.