Are there any tutorials on how to assemble new code and then insert it? I've looked online but can't find very much info on this. Do I write the new assembly and then use a cross assembler, and then take that and insert it in to the NMI routine?
I can walk you through the process. At least 75% of it is merely designing a new routine to do (1) what you want, (2) when you want it. The rest is examining the NMI Routine for a decent insertion point, then manually typing in the new code via Hex Editor. No assemblers, disassemblers, capdapplers and/or schmendlers will be necessary!
Concept Time: NES games are displayed by an electron gun firing a beam against a picture tube which draws the onscreen image left-to-right, top-to-bottom. Once all 240 lines are drawn, a special signal is sent to the NES which informs it that the electron gun is resetting itself for the next framedraw. This is called V-Blank, and it's a very special moment in time. (If you want to use "scanline" as a
unit of time, V-Blanks last about 23 such units.) The NES outputs 60 frames per second.
NES games operate by following a Main Program Loop. During framedraw, the MPL runs a series of checks to determine what's happening in-game at that precise moment, and what needs to be done to keep it happening according to the in-universe "rules". Once it gets thru this checklist, it goes into a sort of hover pattern until the framedraw is complete and that special V-Blank signal is sent.
When that happens, the NES activates something called the Non-Maskable Interrupt (NMI), which is a fancy way of saying "Ultra-special checklist of important shit to do between frames". Onscreen images can only be updated by writing to the NES' Picture Processing Unit (PPU), and that can only happen during V-blank. Thus, NMI Routines are designed to take advantage of this critical period. If designed correctly, they'll end with time to spare and the MPL will begin its checklist prior to the electron gun starting the next framedraw. (This is why most games put their music/sound handlers at or near the very beginning of MPL.)
I know this is a lot to digest at once, but by doing so you'll find the forthcoming procedure to make a lot more sense.