That paragraph was more of a curio, and natural follow on from all the rest if we were still talking about systems using 3d, and that in particular was meant more as a "if you are curious then..." rather than I am too lazy to type it out, though I take your points.
I was (and am) impressed by the sheer scope of your response. I couldn't really add anything that you hadn't at least started to say something about. I could expand. But what's the point? But the "halting problem" was the only "which part isn't like the others?" to me. Doesn't make it wrong. I just wasn't sure why it was there.
If you find a good explanation (the predictive branching the core2 stuff brought to the fore does well here) it covers code flow/execution paths pretty well, or indeed does fairly well at bridging high level loops/branches with low level concepts. Save for recursion it is not as bad as pointers for messing with heads of new programmers but I have seen people struggle, and equally if HTML is to be considered a large part of programming experience of the OP then it is not like such endeavours would have covered loops all that much.
I'm working heavily on writing scalable software. This means (in broad strokes) that using a 4-core CPU will provide twice the performance as using a 2-core one would, and that using a 16-core CPU will provide 8 times the performance. It sounds "obvious," to those without much experience here. But almost everyone using .NET and threads really have no clue, here. They imagine that using separate threads does the trick. But it doesn't. Especially if they have any important data structures for which they use any kind of lock. All that really happens with a very expensive server board sporting four 16-core CPUs is that they have 64 cores all tied up at the same serializing bottle-neck waiting for the same resource and the whole thing performs not so much better than a single core. Look up software transactional memory to see a part of what I'm doing with x86 servers. There are other important theoretical aspects (data structure design figures extremely important here.) But if you design things for scalability, then you actually can achieve quite a lot. I know of only a handful of people doing this, though. Most just use the usual means provided by the usual .NET facilities and honestly have no clues at all. And as a result, their products don't really perform when throwing lots of cores at it. It really takes a new mindset to get that to work well.
[I worked on chipset testing and CPU design at Intel, so I'm pretty well aware of branch prediction and most of the other details operating in the x86 family. Up to the P2, anyway. Those things did single-cycle parallel decode of up to three instructions, converting them to RISC instructions, the use of a ROB or re-order buffer, out-of-order execution, branch prediction, multiple functional units that could be used in parallel (two integer ALUs, two floating point ALUs, etc), and how these things were "retired" as well. (And some of the "approaches" used to deal with sad, earlier design limitations -- the push pop floating point stack, which was a serious bottle-neck itself, for example.) There's a lot more to all this. But probably no point writing about it here.]
Anyway, things are interesting today. I'm enjoying all this.