Yes, you are abolutely right. I did calculations that way because I calculated the times the loop is executed more than once, since at least it must be executed 1 time to check HBlank. In short, the original code is faster if the loop executes once, so I calculated the number of times it is executed making my code faster.

The proper exact calculation would be yours, which implies that loop is executed about 11 times in average.

Looking at the original code in a statistical way, maybe changing the multiply code could make the difference, because you are saving a lot of cycles, although I think those task are not very repetitive, since multiply is used to access name tables to print items, weapons, armors and such...

`Multiplication Function`

Multiplies low bit of A * high bit of A. Stores result in 16-bit A.

C2/4781: 08 PHP

C2/4782: C2 20 REP #$20

C2/4784: 8F 02 42 00 STA $004202

C2/4788: EA NOP

C2/4789: EA NOP

C2/478A: EA NOP

C2/478B: EA NOP

C2/478C: AF 16 42 00 LDA $004216

C2/4790: 28 PLP

C2/4791: 60 RTS

**You could axe that last NOP and it would take 34 cycles, as opposed to the original 39, or your 45. **

Why could I axe the last NOP? Isn't it supposed hardware multiplication is 8 machine cycles long? I just checked that all $4202 multiplications wait for 6 machine cycles, not 8, as I always thought was the correct number.

By the way, I just discovered some 24 bit multiplications in bank $C3 in routines:

* $C3/6D79

* $C3/8CD6

I'll post all my relocatable dissassembly when it's done.