As tempting as it is I will leave out the "bug or feature" discussion.
If there is one phrase anybody with a trade is attuned to then it is "while you are here...", however a close second is anything that can easily be, or indeed is, phrased with the word "just". For the record as well it is not going to get that much easier/less tedious, though possibly for slightly different reasons, if you move to newer consoles/devices and want to change whole tracks/songs, single instruments/samples sure in some cases but not whole songs.
On the NES being hard... well the NES was a kind of low powered embedded device in an era when everybody had/needed their own custom embedded devices to get stuff done. Being low powered but in an era when everybody was used to such things meant they tended to extend the power of the device, though Nintendo did allow for this, at the cost of being slightly harder to work with (not a particular issue if you were the one that made the change).
One reference to the NES audio setup
http://wiki.nesdev.com/w/index.php/NES_APUA more visual one, though it technically leaves out of some of the oddities
http://www.youtube.com/watch?v=la3coK5pq5wIn the absence of something more high level (read at emulator level) your two approaches as the ones I would be using/contemplating. Now if you are sure square 1 is never used for anything else in the game, or at least nothing else where there might be SFX, then you can figure out where its sound setup is located easily enough.
square 2 might be harder but looking at some gameplay then if you can hold off releasing the ball then you do not appear to have any extra effects. If not then pause menus and sound tests (I did not see either for this game at this point) are often good things.
As a first pass I would probably try disabling square 1's normal inputs and then doing a high level redirect of square 2's non sfx stuff to square 1 (manually take 8 from the address to get it into square 1's register, if indeed the game does add 8 to get it in square 2).
Anyway yeah this is going to take assembly rather than learning a format and acting in accordance with it, however it should a simpler kind of assembly (you are watching and changing addresses).
Redirecting the sfx instead is a viable option, however I would probably only do it if each level/stage had its own audio tracks and this one did not appear to in the video (some half hour) I skipped though for this.