It has to do with two things: the number of cycles until the value is available, and the number of cycles consumed executing commands.
Unlike most other opcodes, when any integer multiply/divide is executed it takes several cycles before the result is ready to use. It does this in the background as other operations are executed, or until you call MFHI or MFLO. If it isn't ready yet, it eats those cycles doing nothing until the value is ready.
If the multiply instruction is followed by an MFHI or MFLO before the
product is available, the pipeline interlocks until this product does become
Besides the operations eating a certain number of cycles, they can overlap over a certain number of cycles as well. Also from the manual:
Table 2-2 Multiply/Divide Instruction Cycle Timing
OP Cycles Overlap
MULT 12 10
MULTU 12 10
DIV 75 0
DIVU 75 0
DMULT 20 18
DMULTU 20 18
DDIV 139 0
DDIVU 139 0
In actual fact, some commands eat more cycles than others. For instance, the adder is one of the fastest, if not the
fastest features, and many integer operations take less than a cycle. The shifter eats a little more, shoving the next command into a new cycle but you can usually pass it along with other integer commands. It's usually safe to assume one OP = 1 cycle on average.
There's other stipulations as well. You need at least two OPs between reads and writes of the MFHI/LO regs, for instance. In addition, you need to manually test for division by zero.
So, how does this affect you?
Shift commands eat, roundabout, one cycle each. All MULT/DIV operations consume much more than that. Notice there's a difference between number of cycles and overlap. That doesn't factor in the MFxx OP. Your MULTs then will consume 2 cycles + 1 for the MFxx. Division though is much worse. They're expensive and you have to pay for them in full.
By comparison, shifting--especially right shifting--is much faster. Each OP consumes only one cycle each in the worst case and there's no overlap to consider. Addition is ridiculously fast, and you can pass several of them within a cycle. An add+shift combo will likely fall within the same cycle.
It might seem silly though fussing over what is an insignificant number of cycles relative to the processor's speed. However, string together a bunch of DIVs and loop it and you get the nightmare that is SCUMMVM's main menu.
There's times though that division is a lot more useful or essential, such as computing mordants. Likewise, one advantage of MULTs are 64bit results without having to split and combine the results of Sxx/DSxx OPs. If you don't need a MULT result any time soon it isn't so bad.
At all other times shifting is more efficient.