For normal dma, all games should go through cycle stealing process (to simulate correct vblank times).
For hdma, we're supposed to steal cycles also but I wouldn't know how much (probably not important).
Shantae updates sprites during hblank, which
is smart way to use hdma. Double speed hdma transfers
32 bytes; single mode copies 16 bytes.
-- This is where Shantae likely broke down. It's 2x speed game.
EDIT: I will be moving Goomba discussion to
https://github.com/EvilJagaGenius/jagoombacolorSorry about that!