@snesfanboi
Maybe it might be best to do 2 separate patches. A fastrom and a
fastrom plus optimizations. The goal is to remove all slowdowns
so the further we progress the more the game is going to play like
the sa-1 patch. When we get all of the slowdowns out I'll see about
balancing out the game play so its not too hard. I don't really have
any interest in the final fight series but if someone gets a disassembly
dump of it similar to the disassembly I did for gradius 3 then someone
might consider making some optimizations to the code. The disassembled
code is very important to making these kinds of modifications. It took
awhile to do gradius 3 disassembly but it was well worth it.
@Aaendi
I'm currently working on dissecting the whole function $02/96DF-$02/99E2.
Not sure if you made any progress on figuring out exactly how that array
works yet but I'm making progress on it and I should have it fully figured
out soon. While I was dissecting the function $02/96DF-$02/99E2 I couldn't
help but try out a few optimizations. Let me know if any of these
optimizations are useful.
This code looked like it could use a optimization.
$82/9819 E6 18 INC $18 [$00:0018] A:08E8 X:0C00 Y:08E8 D:0000 DB:7E S:1DE6 P:nvmbdIzc V:011 H:0504 F:12 C:07
$82/981B E6 18 INC $18 [$00:0018] A:08E8 X:0C00 Y:08E8 D:0000 DB:7E S:1DE6 P:NvmbdIzc V:011 H:0594 F:12 C:07
$82/981D E6 18 INC $18 [$00:0018] A:08E8 X:0C00 Y:08E8 D:0000 DB:7E S:1DE6 P:NvmbdIzc V:011 H:0644 F:12 C:07
$82/981F E6 18 INC $18 [$00:0018] A:08E8 X:0C00 Y:08E8 D:0000 DB:7E S:1DE6 P:NvmbdIzc V:011 H:0694 F:12 C:07
I wrote this in its place.
$82/9819 A9 04 00 LDA #$0004 A:0AA4 X:0C00 Y:0AA4 D:0000 DB:7E S:1DE1 P:nvmbdIzC V:008 H:0068 F:25 C:03
$82/981C 18 CLC A:0004 X:0C00 Y:0AA4 D:0000 DB:7E S:1DE1 P:nvmbdIzC V:008 H:0086 F:25 C:02
$82/981D 65 18 ADC $18 [$00:0018] A:0004 X:0C00 Y:0AA4 D:0000 DB:7E S:1DE1 P:nvmbdIzc V:008 H:0098 F:25 C:04
$82/981F 85 18 STA $18 [$00:0018] A:E2C1 X:0C00 Y:0AA4 D:0000 DB:7E S:1DE1 P:NvmbdIzc V:008 H:0126 F:25 C:04
The original is 28 cycles and the new is 13 cycles.
You can also use the same optimization for this code as well and save 1 cycle. Just change the value of the lda to 0x0002.
$82/98AA E6 18 INC $18 [$00:0018] A:08D6 X:0C00 Y:08D6 D:0000 DB:7E S:1DE3 P:nvmbdIzC V:055 H:0108 F:30 C:07
$82/98AC E6 18 INC $18 [$00:0018] A:08D6 X:0C00 Y:08D6 D:0000 DB:7E S:1DE3 P:NvmbdIzC V:055 H:0158 F:30 C:07
Here is another one that can be optimized.
$02/9771 B5 12 LDA $12,x [$00:7C00] A:0300 X:61AE Y:004E D:1A40 DB:01 S:1DF3 P:eNvmxdIzcHC:21150 VC:000 00 FL:00
$02/9773 DA PHX A:0300 X:61AE Y:004E D:1A40 DB:01 S:1DF3 P:eNvmxdIzcHC:21158 VC:000 00 FL:00
$02/9774 0A ASL A A:0300 X:61AE Y:004E D:1A40 DB:01 S:1DF3 P:eNvmxdIzcHC:21166 VC:000 00 FL:00
$02/9775 AA TAX A:0300 X:61AE Y:004E D:1A40 DB:01 S:1DF3 P:eNvmxdIzcHC:21174 VC:000 00 FL:00
$02/9776 BF 80 97 02 LDA $029780,x[$02:F92E] A:0300 X:61AE Y:004E D:1A40 DB:01 S:1DF3 P:eNvmxdIzcHC:21182 VC:000 00 FL:00
$02/977A FA PLX A:0300 X:61AE Y:004E D:1A40 DB:01 S:1DF3 P:eNvmxdIzcHC:21190 VC:000 00 FL:00
$02/977B 85 00 STA $00 [$00:1A40] A:0300 X:61AE Y:004E D:1A40 DB:01 S:1DF3 P:eNvmxdIzcHC:21198 VC:000 00 FL:00
$02/977D 6C 00 00 JMP ($0000)[$00:0000] A:0300 X:61AE Y:004E D:1A40 DB:01 S:1DF3 P:eNvmxdIzcHC:21206 VC:000 00 FL:00
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
This is data not ASM. This is data for the jump above
$02/9780-$02/9787 data location
88 97
F7 97
32 99
BC 99
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
I wrote this in its place.
$82/9771 B5 12 LDA $12,x [$00:2C5A] A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0016 VC:000 00 FL:41148
$82/9773 F0 13 BEQ $13 [$9788] A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0022 VC:000 00 FL:41148
$82/9775 3A DEC A A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0028 VC:000 00 FL:41148
$82/9776 F0 09 BEQ $09 [$9781] A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0034 VC:000 00 FL:41148
$82/9778 3A DEC A A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0040 VC:000 00 FL:41148
$82/9779 F0 03 BEQ $03 [$977E] A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0046 VC:000 00 FL:41148
$82/977B 4C BC 99 JMP $99BC [$81:99BC] A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0052 VC:000 00 FL:41148
$82/977E 4C 32 99 JMP $9932 [$81:9932] A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0058 VC:000 00 FL:41148
$82/9781 4C F7 97 JMP $97F7 [$81:97F7] A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0064 VC:000 00 FL:41148
$82/9784 EA NOP A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0070 VC:000 00 FL:41148
$82/9785 EA NOP A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0076 VC:000 00 FL:41148
$82/9786 EA NOP A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0082 VC:000 00 FL:41148
$82/9787 EA NOP A:0003 X:2C48 Y:1F3F D:0000 DB:81 S:1DE3 P:envmxdIzCHC:0088 VC:000 00 FL:41148
Here is how many cycles for each jump.
$82/9771 B5 12 LDA $12, X [$00:0CD2] A:9771 X:0CC0 Y:0070 D:0000 DB:81 S:1DE3 P:NvmbdIzc V:020 H:1006 F:44 C:05
$82/9773 F0 13 BEQ $13 [$9788] A:0000 X:0CC0 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIZc V:020 H:1040 F:44 C:03 = 8 //$82:9788
$82/9771 B5 12 LDA $12, X [$00:0C12] A:9771 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:NvmbdIzc V:009 H:0212 F:45 C:05
$82/9773 F0 13 BEQ $13 [$9788] A:0001 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIzc V:009 H:0246 F:45 C:02
$82/9775 3A DEC A A:0001 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIzc V:009 H:0258 F:45 C:02
$82/9776 F0 09 BEQ $09 [$9781] A:0000 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIZc V:009 H:0270 F:45 C:03
$82/9781 4C F7 97 JMP $97F7 [$81:97F7] A:0000 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIZc V:009 H:0288 F:45 C:03 = 15 //$82:97f7
$82/9771 B5 12 LDA $12, X [$00:0C12] A:9771 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:NvmbdIzc V:027 H:0278 F:13 C:05
$82/9773 F0 13 BEQ $13 [$9788] A:0002 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIzc V:027 H:0312 F:13 C:02
$82/9775 3A DEC A A:0002 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIzc V:027 H:0324 F:13 C:02
$82/9776 F0 09 BEQ $09 [$9781] A:0001 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIzc V:027 H:0336 F:13 C:02
$82/9778 3A DEC A A:0001 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIzc V:027 H:0348 F:13 C:02
$82/9779 F0 03 BEQ $03 [$977E] A:0000 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIZc V:027 H:0360 F:13 C:03
$82/977E 4C 32 99 JMP $9932 [$81:9932] A:0000 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIZc V:027 H:0378 F:13 C:03 = 19 //$82:9932
$82/9771 B5 12 LDA $12, X [$00:0C12] A:9771 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:NvmbdIzc V:011 H:0458 F:11 C:05
$82/9773 F0 13 BEQ $13 [$9788] A:0003 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIzc V:011 H:0492 F:11 C:02
$82/9775 3A DEC A A:0003 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIzc V:011 H:0504 F:11 C:02
$82/9776 F0 09 BEQ $09 [$9781] A:0002 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIzc V:011 H:0516 F:11 C:02
$82/9778 3A DEC A A:0002 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIzc V:011 H:0528 F:11 C:02
$82/9779 F0 03 BEQ $03 [$977E] A:0001 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIzc V:011 H:0580 F:11 C:02
$82/977B 4C BC 99 JMP $99BC [$81:99BC] A:0001 X:0C00 Y:0070 D:0000 DB:81 S:1DE3 P:nvmbdIzc V:011 H:0592 F:11 C:03 = 18 //$82:99bc
This code can be further optimized by moving the most used jump to the spot where the least
cycles are used. These($02/977D 6C 00 00 JMP ($0000)[$00:0000]) jumps are used in a lot of places and if they have 8 or less jumps they can be optimized with this new code. So far I tested this new jump code at these locations.
$82/9728 //function for 128x128 bubbles
$82/9754 //function for 64x64 bubbles
$82/9771 //function for 32x32 bubbles
$82/978E //function for 16x16 bubbles
Here is another one that I optimized but I maxed out the character limit for this post
so I cant show the new codes. There was a lot of them to change.
$82/815D A5 03 LDA $03 [$00:0003] A:00ED X:0C00 Y:0030 D:0000 DB:81 S:1DD9 P:envmxdIzcHC:1138 VC:008 00 FL:42835
$82/815F 8D 02 42 STA $4202 [$81:4202] A:ED10 X:0C00 Y:0030 D:0000 DB:81 S:1DD9 P:eNvmxdIzcHC:1166 VC:008 00 FL:42835
$82/8162 EA NOP A:ED10 X:0C00 Y:0030 D:0000 DB:81 S:1DD9 P:eNvmxdIzcHC:1196 VC:008 00 FL:42835
$82/8163 EA NOP A:ED10 X:0C00 Y:0030 D:0000 DB:81 S:1DD9 P:eNvmxdIzcHC:1208 VC:008 00 FL:42835
$82/8164 EA NOP A:ED10 X:0C00 Y:0030 D:0000 DB:81 S:1DD9 P:eNvmxdIzcHC:1220 VC:008 00 FL:42835
$82/8165 EA NOP A:ED10 X:0C00 Y:0030 D:0000 DB:81 S:1DD9 P:eNvmxdIzcHC:1232 VC:008 00 FL:42835
$82/8166 AD 16 42 LDA $4216 [$81:4216] A:ED10 X:0C00 Y:0030 D:0000 DB:81 S:1DD9 P:eNvmxdIzcHC:1244 VC:008 00 FL:42835
There are at least 2 different STZ's that could be used instead of using these 4 nops
to wait the 8 cycles to get your results.
I laughed when I seen this code. There is no way someone wrote this by hand is there?
And no, the code isn't trying to use up cycles to get results like the multiplication code.
$02/97A8 4C AB 97 JMP $97AB [$02:97AB]
May 22, 2019, 12:29:49 am - (Auto Merged - Double Posts are not allowed before 7 days.)
@Aaendi
I have most of that array figured out. This is for 32x32 size bubbles collision data.
This data appears to be for only bubbles colliding with bubbles.
3x3 8 pixel squares(24x24 pixel) minus a center square(square #5)
example:
ooo
oxo
ooo
7E FF 07 00 //1st 2 bytes = collision square #1 location, 2nd 2 bytes is for direction to bounce off of bubble
80 FF 04 00 //1st 2 bytes = collision square #2 location
82 FF 01 00 //1st 2 bytes = collision square #3 location
FE FF 02 00 //1st 2 bytes = collision square #4 location
02 00 02 00 //1st 2 bytes = collision square #6 location, 0x02 0x00 = left, right, opposite
7E 00 01 00 //1st 2 bytes = collision square #7 location, 0x01 0x00 = upper/right, lower/left, opposite
80 00 04 00 //1st 2 bytes = collision square #8 location, 0x04 0x00 = up, down, opposite
82 00 07 00 //1st 2 bytes = collision square #9 location, 0x07 0x00 = upper/left, lower/right, opposite
FF FF //end of collision size and direction
Also if you set the 2nd 2 bytes to bounce off of bubble to 0x00 0x00 then
the 32x32 bubbles wont collide with each other. It appears the 2nd 2 bytes
have a max of 0x7 and that's not looking at the data that is looking at the asm.
Here is the 16x16 bubble collision data
$81:E2df
00 00 06 00 //collision square #1 location, 0x06 00 = up/down, left/right, opposite
FF FF
Ram used for colliding bubbles is 7E:C000-7E:CFFF. 64x32 8x8 square collision array. 2 bytes per square
Here is the 64x64 bubble collision.
64x64 bubble collision
00000
00xxx00
0xxxxx0
0xxxxx0
0xxxxx0
00xxx00
00000
$01:E25B
7C FE 07 00 //collision square #1
7E FE 04 00 //collision square #2
80 FE 04 00 //collision square #3
82 FE 04 00 //collision square #4
84 FE 01 00 //collision square #5
FA FE 07 00 //collision square #6
FC FE 07 00 //collision square #7
04 FF 01 00 //collision square #11
06 FF 01 00 //collision square #12
7A FF 02 00 //collision square #13
86 FF 02 00 //collision square #19
FA FF 02 00 //collision square #20
06 00 02 00 //collision square #26
7A 00 02 00 //collision square #27
86 00 02 00 //collision square #33
FA 00 01 00 //collision square #34
FC 00 01 00 //collision square #35
04 01 07 00 //collision square #38
06 01 07 00 //collision square #39
7C 01 01 00 //collision square #40
7E 01 04 00 //collision square #41
80 01 04 00 //collision square #42
82 01 04 00 //collision square #43
84 01 07 00 //collision square #44
FF FF
I also have the 128x128 bubble collision data mapped but my posts text is maxed out so
I cant post it yet.
