News:

11 March 2016 - Forum Rules

Main Menu

An optimization question (MIPS assembler)

Started by tryphon, November 27, 2012, 05:20:19 PM

Previous topic - Next topic

tryphon

Hi,

during my investigations in a game code, I see often the following, to compute 44*a0 :


sll v0, a0, 1 ; v0 = 2a0
addu v0, v0, a0 ; v0 = 3a0
sll v0, v0, 2 ; v0 = 12a0
subu v0, v0, a0 ; v0 = 11a0
sll v0, v0, 2 ; v0 = 44a0


It reminds me the ol' good time of 8b processor with no mul instruction (or a mul that was so slow that playing with shifts was more efficient) and I suppose this is the kind of 'optimization' automatically generated by a C compiler, but now, is it really faster than, say, a :


addiu t0, zero, 44
multu v0, t0, a0

Zoinkity

It has to do with two things: the number of cycles until the value is available, and the number of cycles consumed executing commands. 
Unlike most other opcodes, when any integer multiply/divide is executed it takes several cycles before the result is ready to use.  It does this in the background as other operations are executed, or until you call MFHI or MFLO.  If it isn't ready yet, it eats those cycles doing nothing until the value is ready.

Quote from: From the MIPS manualIf the multiply instruction is followed by an MFHI or MFLO before the
product is available, the pipeline interlocks until this product does become
available.

Besides the operations eating a certain number of cycles, they can overlap over a certain number of cycles as well.  Also from the manual:
Table 2-2 Multiply/Divide Instruction Cycle Timing
OP Cycles Overlap
MULT 12 10
MULTU 12 10
DIV 75 0
DIVU 75 0
DMULT 20 18
DMULTU 20 18
DDIV 139 0
DDIVU 139 0


In actual fact, some commands eat more cycles than others.  For instance, the adder is one of the fastest, if not the fastest features, and many integer operations take less than a cycle.  The shifter eats a little more, shoving the next command into a new cycle but you can usually pass it along with other integer commands.  It's usually safe to assume one OP = 1 cycle on average.

There's other stipulations as well.  You need at least two OPs between reads and writes of the MFHI/LO regs, for instance.  In addition, you need to manually test for division by zero.


So, how does this affect you?
Shift commands eat, roundabout, one cycle each.  All MULT/DIV operations consume much more than that.  Notice there's a difference between number of cycles and overlap.  That doesn't factor in the MFxx OP.  Your MULTs then will consume 2 cycles + 1 for the MFxx.  Division though is much worse.  They're expensive and you have to pay for them in full. 

By comparison, shifting--especially right shifting--is much faster.  Each OP consumes only one cycle each in the worst case and there's no overlap to consider.  Addition is ridiculously fast, and you can pass several of them within a cycle.  An add+shift combo will likely fall within the same cycle.

It might seem silly though fussing over what is an insignificant number of cycles relative to the processor's speed.  However, string together a bunch of DIVs and loop it and you get the nightmare that is SCUMMVM's main menu.

There's times though that division is a lot more useful or essential, such as computing mordants.  Likewise, one advantage of MULTs are 64bit results without having to split and combine the results of Sxx/DSxx OPs.  If you don't need a MULT result any time soon it isn't so bad. 
At all other times shifting is more efficient.

tryphon

Thanks for the explanation :)

I have two questions :
- where can I find informations about the cycle and overlap for the Emotion Engine ? I have some technical stuff, but it doesn't go so much in details.

- Does PCSX2 care about these issues ? My problem (one of my problems rather :) ) is that I made some hacks on a PS2 game, that work perfectly with PCSX2, but make a real PS2 crash. Could it be related ?

Zoinkity

I'm afraid I don't know much of anything about the PSX series (N64 hacker with the good 'ol MIPS 4300) and really this was gleaned from the manual.  That said, some of this might not apply after all (but would have to a PS1 more than likely) since the pipeline is arranged differently.


In my experience emulators don't emulate low enough level to catch quirks like this.  In fact, there really aren't any cycle-accurate emulators above SNES generation.  Plus, many don't bother enforcing nuanced behaviours like this that wouldn't show up in retail games anyway. 
That said, unless you specifically call a MFHI/MFLO within two ops of the MULT/DIV you shouldn't fail at any point.  Shifting is faster but takes more OPs.


You might want to try asking about this at assemblergames; there are people doing active development that could answer specific questions like this.

tryphon

You were of great help. Thanks a lot, and thanks for the advice for assemblergames which I didn't know :)