News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: slidelljohn (a.k.a.[J]) snes projects page  (Read 19262 times)

slidelljohn

  • Sr. Member
  • ****
  • Posts: 250
    • View Profile
slidelljohn (a.k.a.[J]) snes projects page
« on: January 01, 2019, 10:46:14 pm »
Welcome To My Projects Page!

I currently have 2 important projects that need to be finished and released.

Project #1 Gradius 3 (fast rom about 99% complete, sram support work in progress)

I have a near complete Gradius 3 fast rom. All assembly is complete there is just some data that still needs
to be changed to fast rom locations but it shouldn't really affect the speed much. There is definitely a noticeable
amount of decreased slowdowns but it still doesn't remove all slowdowns. Here is the patch:

http://www.mediafire.com/file/cg9yg9tpce03ytk/Gradius_III_%2528U%2529_%255B%2521%255D_%257BFast_Rom%257D.rar/file

I also added extra difficulties that you can unlock by holding L, R, Y, and B buttons at the same time when you go into the
option menu. There are 4 new difficulties which are elite, master, expert and pro. Normal, hard and arcade are slightly easier.
Elite difficulty is the same as the original arcade difficulty but master, expert, and pro are gradually harder with pro as the
hardest.



The lastdual helped me with this patch by checking for bugs that I might have missed. There could still be some bugs so let me
know if you find any. I really need someone to try and beat the hardest level to see if there are any bugs in the gameplay.

My main goal for gradius 3 is to remove all slowdown, add sram and make it feel like a competition cart. I also want to add
a new system for calculating your score after each boss kind of similar to super punch out. Now fast rom will never remove all
of the slowdowns so I have been learning about the sa-1 chip. I do have a gradius 3 rom that is updated to use the sa-1 chip but it
does not load gradius 3's assembly yet. I have been gradually changing the assembly code to store all wram data into bw-ram for
the sa-1 chip to work properly, this is near complete. When I complete the wram to bw-ram change I'll start working on getting
the sa-1 to load most of the assembly. I plan on having the snes cpu live in wram so the cpu's don't collide so I can get the full
speed out of the sa-1 chip. I have no clue how long this is going to take but I chose gradius 3 for this because it is a small rom.
I hope everyone enjoys the fast rom patch and the extra difficulties.





Project #2 New snes mod chip called Super V-Power! (v = vram) (100% working but could still be improved)

This mod chip is for adding extra vram to the snes. The snes has 64kb of vram but the ppu is capable of using 128kb of vram.
Creating this mod chip kind of just fell in my lap. I did not figure out the ppu can use the extra vram and I'm not really sure who
did but byuu made higan v1.00 and up capable of using the extra vram because he said it should be possible. You can see how I ended
up creating this mod chip on byuu's forum. Byuu's forum is gone but it has been completely archived here:

http://helmet.kafuka.org/byuubackup2/viewtopic.php@f=16&t=1559.html

All of the mod chip stuff is at the end of that page but you can see how it ended up getting made from reading everything from the
beginning. I also modified a snes debugger to show the extra vram in the vram viewer so there is a debugger that's supports the extra
graphics. The link to the debugger is on that 1st page. Qwertymodo was a really big help in showing me how to create the castellated
holes to mount the pcb to the snes and he helped me in how/who to send the pcb to to get made. The entire pcb I designed myself and
created in eagle cad. Im getting ready to create a updated version of the super v-power mod chip and when its finished I'll upload
all of my schematics. Im not interested in making any money off of the mod chip it will be fully open source for anyone to make it if
they want to. I do have a few mod chips that Im willing to give away for free to people that have the skills and want to create hacks
that use the extra vram. If anyone has a fully working level editor for a snes game and would like to add this feature to their editor
I would be willing to modify the game that uses their editor to use the 128kb of vram. I already have a mmx v1.1 rom that fully supports
the extra vram but I need to write a editor from scratch to be able to use it. As far as all my searching online I am the only person
to try and get 128kb of vram to work on the snes. I was trying to get analog to get their super nt to support the extra vram but I
didn't have any luck probably because nothing uses the extra vram yet. One day they might add support for it.

Here is my mod chip installed on the snes motherboard:



If I can get the sa-1 hack completed it would set me up for a project that will blow all other projects that I have worked on away,
which is, mega man x with sa-1, 128kb vram and a new engine to have game play like super metroid.



Moderators:
Hopefully it was/is ok for me to talk about this chip here because it could be a game changer for future snes rom hacks. If I broke any
rules let me know and it wont happen again and I'll delete the Project #2 post.




« Last Edit: January 02, 2019, 01:10:16 am by slidelljohn »

niuus

  • Jr. Member
  • **
  • Posts: 74
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #1 on: January 04, 2019, 02:37:53 pm »
Wow. I find your mod chip project one of the most interesting things i've seen for a long time on the SNES scene (since the MSU-1, for me). Please, more info! I've read the thread, it does sound really amazing.  :beer:
« Last Edit: January 04, 2019, 03:01:43 pm by niuus »

darkmoon2321

  • Jr. Member
  • **
  • Posts: 59
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #2 on: January 04, 2019, 08:54:24 pm »
Interesting.  Gradius III is a fun game that I played through a few times back in the day.  The game would slow to a crawl every time one of those big serpent creatures was on-screen though.  Out of curiosity, have you done any kind of analysis on the code to figure out why it is lagging so badly in the first place?  I haven't looked into the game's code myself, though I might take a look now.  My guess is that it would be significantly easier to correct some inefficient code rather than try to convert over to SA-1.  If it's an issue of graphics decompression, there is also the route of placing decompressed graphics directly in the ROM and bypassing the decompression routine, particularly for commonly used images.

slidelljohn

  • Sr. Member
  • ****
  • Posts: 250
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #3 on: January 04, 2019, 09:20:17 pm »
More info you got it 8)

I have some more pcb components on the way from China as soon as these come in(about 2 weeks) I’ll be able to see if I’ll be able make the upgrade that I would like to do. I’m going to try to split the pcb into 2 pcb’s. One pcb I’m going to try and put header pins on it to mount it around the cpu and the ppu1 and it will have all of the resistors, transistors, status leds, a couple switches and a connector for a flat ribbon cable on it. The other pcb will only have the two ram chips and a connector for a flat ribbon cable. The second pcb I will also drop the castellated holes for a bga setup. I think this setup will make everything a lot cleaner and professional looking. Now this 2 pcb setup probably won’t work on the 1st released motherboard that has the vertical ram chips because you have to solder directly to the pins on the cpu and ppu but I might be able to figure something out for that one.

With my current setup for the pcb using the cpu unused i/o ports I haven’t found any games that don’t work properly. Without using those ports from the cpu there are games that don’t work properly with the major one being super mario world 2.

All of the major details are in the link to byuu’s archived forum. I have good pictures for mega man x1 showing what the original 64kb of vram looks like and what the upgraded 128kb of vram looks like and it has the patch for mmx1 and the nSide debugger to create hacks for it.

Currently the only games I plan on hacking with the extra vram are mega man x1 and the 7th saga. The only other games I’ll probably add the 128kb of vram to is one that someone has a level editor for that they are willing to add support for the extra vram.

darkmoon2321:
Yes I looked into some of it but I still need to do more research on that. From what I was seeing the slowdown is caused by the algorithm for the sprites. Yes removing compression does speed things up but not really during gameplay mostly when levels are loading. I have mmx1 and 7th saga that I have removed all graphics compression and gameplay still slows down but level loading is way faster. I already have 100% of gradius 3 asm disassembled and most of the data mapped out so interesting things are being worked on to improve the game.

I could release the disassembled asm for gradius 3 but I’m not really sure what the right way is to release it. Can I release a text file with all of the asm on romhacking.net without getting in trouble or do I need to write a program that rips it out of the game?
« Last Edit: January 04, 2019, 09:32:35 pm by slidelljohn »

darkmoon2321

  • Jr. Member
  • **
  • Posts: 59
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #4 on: January 04, 2019, 10:30:19 pm »
I could release the disassembled asm for gradius 3 but I’m not really sure what the right way is to release it. Can I release a text file with all of the asm on romhacking.net without getting in trouble or do I need to write a program that rips it out of the game?

I'm pretty sure you can release it here.  There are a few complete disassembly documents on the site already.  I don't see any for SNES titles yet though, so that could be a first.  I did go ahead and get a trace log on some of the lag, and I can see where it happens, but you're probably miles ahead of me if you already have a full disassembly.

Edit: I did a little more research into this.  The sprite drawing code does consume a lot of processing time, and I think it is possible to make it run more efficiently.  However, that's not the only issue.  On the lag trace that I got, the game didn't even reach the sprite drawing routine before it was interrupted by the next NMI.  It appears that collision detection is occupying the largest chunk of time, in my test case approximately half of an entire frame.
« Last Edit: January 05, 2019, 05:12:01 pm by darkmoon2321 »

lastdual

  • Jr. Member
  • **
  • Posts: 77
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #5 on: January 05, 2019, 07:36:22 pm »
Good luck with that vram project! It sounds really ambitious :beer:

I've done a number of side-by-side tests with the Gradius 3 fastrom hack and the improvement is noticeable, though as slidelljohn stated there is still slowdown. However, when combining the hack with an emulator that allows for overclocking (such as the Snes9X 2010 Libretro core for RetroArch), the game is virtually slowdown-free. It's pretty awesome after all these years to play one of the first 16-bit shmups the way it was meant to be played. Gradius 3 is a personal favorite of mine (I prefer the SNES version--that soundtrack is gold), and seeing it running so smooth (also flicker-free using the above emulator) is a real treat.

slidelljohn

  • Sr. Member
  • ****
  • Posts: 250
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #6 on: January 07, 2019, 10:05:27 pm »
I'm pretty sure you can release it here.  There are a few complete disassembly documents on the site already.  I don't see any for SNES titles yet though, so that could be a first.  I did go ahead and get a trace log on some of the lag, and I can see where it happens, but you're probably miles ahead of me if you already have a full disassembly.

Edit: I did a little more research into this.  The sprite drawing code does consume a lot of processing time, and I think it is possible to make it run more efficiently.  However, that's not the only issue.  On the lag trace that I got, the game didn't even reach the sprite drawing routine before it was interrupted by the next NMI.  It appears that collision detection is occupying the largest chunk of time, in my test case approximately half of an entire frame.

I'm still not sure about the rules for releasing the full disassembly but Ill probably end up writing a program that can extract it. I do have the main function for gradius 3 that I used to pinpoint the slowdown that I found.
Here is a text file of the main function:
http://www.mediafire.com/file/36ld5ftx8tq24b5/gradius_3_main_function.txt/file

In the main function if this line ($80/82A2 22 F1 8E 80 JSL $808EF1[$80:8EF1]) is nop'ed reload the cart from the begging and
just let the game play until the ship crashes. It should crash around 2 minutes and 31 seconds. Now if you let the game play
without removing that line the game will crash around 2 minutes and 46 seconds. So just that one line of code eats up 15 seconds.
Not sure if the collision detection is loaded from that line but I recall 2 or 3 sections of code that shows the slowdown.

I definitely wouldn't mind teaming up and try to make the code more efficient. I would think there is room for improvement especially when the programmers do stuff like this
Code: [Select]
$80/8223 C2 20       REP #$20
$80/8225 C2 10       REP #$10

instead of this
Code: [Select]
$00/0000 C2 30       REP #$30
I do have a lot of stuff that I am working on right now so I am not sure when I'll have a chance to look at the code again
but improving the code is definitely something that I would like to do.

Good luck with that vram project! It sounds really ambitious :beer:

I've done a number of side-by-side tests with the Gradius 3 fastrom hack and the improvement is noticeable, though as slidelljohn stated there is still slowdown. However, when combining the hack with an emulator that allows for overclocking (such as the Snes9X 2010 Libretro core for RetroArch), the game is virtually slowdown-free. It's pretty awesome after all these years to play one of the first 16-bit shmups the way it was meant to be played. Gradius 3 is a personal favorite of mine (I prefer the SNES version--that soundtrack is gold), and seeing it running so smooth (also flicker-free using the above emulator) is a real treat.
Thanks! I still never tried RetroArch. I guess all the hacking I do eats up all of my free time. :D

I got one of the parts that I was waiting on to see if I can improve the super v-power mod chip and I came up with something even better. I wanted to split the chip into 2 chips so I wouldn't solder wires directly to the board but as I was testing everything I came up with the idea of having one pcb wrapping around the cpu and the ppu so there would be no wires and no ribbon cables. This will only be possible by getting a custom injection molded header pin that connects to the cpu pins, ppu pin, and all of the vram pins. I found a company that can make the custom header pin for me but its going to take some time to get all of the correct measurements for the holes. If I can get this done I can add a bonus to the super v-power mod chip. I can add the super cic chip to the super v-power as well because it also has a hole for every pin for the cic chip. Imagine a all in on 128kb vram and super cic. This is turning out way better than I expected.

Here is a image showing all of the holes for all of the pins for the 2 vram chips and the cic chip. url image is large:
https://imgur.com/ONDw8XG

Here is a paper cutout to give everyone a idea of the shape of the new pcb design with the super cic added:
https://imgur.com/KCgeDEM

After I get the pcb shape and the holes correctly mapped out in eagle cad I'll see about getting the custom header pins made.
My schematics and custom header pins will all be open source when they are ready.

One of the bsnes plus developers is currently working on adding the super v-power capabilities to the bsnes plus debugger so
hopefully soon we will all have a really good debugger for future rom hacks that can use this new unused hidden feature for the original snes console. :woot!:

« Last Edit: January 07, 2019, 11:34:59 pm by slidelljohn »

darkmoon2321

  • Jr. Member
  • **
  • Posts: 59
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #7 on: January 08, 2019, 11:48:05 am »
In the main function if this line ($80/82A2 22 F1 8E 80 JSL $808EF1[$80:8EF1]) is nop'ed reload the cart from the begging and
just let the game play until the ship crashes. It should crash around 2 minutes and 31 seconds. Now if you let the game play
without removing that line the game will crash around 2 minutes and 46 seconds. So just that one line of code eats up 15 seconds.
Not sure if the collision detection is loaded from that line but I recall 2 or 3 sections of code that shows the slowdown.

That function does indeed call the sprite handling code.  I wrote a simple program that parses through Geiger's trace logs and counts the number of times each line of code executes so that I can find good targets for optimizing code.  The collision detection code that I identified before spans 80/EA7A to 80/EC3C.  From the main function, it is reached the line before the sprites, so ($80/829E 22 8E 87 80 JSL $80878E).  I'm not entirely sure if collision detection is all this code does, but it appears to be a significant part of it.  If you're interested in the short program I made, source is here: https://www.dropbox.com/s/0x3p0p9hsp3dmcm/geiger_trace_frequency_counter.cpp?dl=0

I can take a quick look at optimizing the sprite display code a bit, I have a better understanding of that than the collision code at the moment.

slidelljohn

  • Sr. Member
  • ****
  • Posts: 250
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #8 on: January 08, 2019, 07:13:45 pm »
I cant seem to download that file from dropbox. The web page just shows a white screen. It says
the server is encrypted.

I think this is all of the assembly that is used for the sprites at $80:8EF1:
http://www.mediafire.com/file/h444w9o6k5vm64d/gradius_3_sprite_assembly_0x008ef1.txt/file

I have some notes on some lines in the text file but they are not complete as I was still in the process
of documenting them before I took a break on this project.

« Last Edit: January 08, 2019, 07:26:32 pm by slidelljohn »

ExL

  • Jr. Member
  • **
  • Posts: 29
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #9 on: January 10, 2019, 01:00:25 am »
Use this link instead - https://www.dropbox.com/s/0x3p0p9hsp3dmcm/geiger_trace_frequency_counter.cpp?dl=1
"dl=0" prevented you from downloading. But just in case I've reuploaded it here: https://mega.nz/#!7V0hUIpB!BfP0jc1Hx2qDfP0ivSh6JGblVSHSMEvgSRcAwNjuMOk
I'm just passing by, with no knowledge on subject, but that looks very interesting ::) I hope snes9x and other emulators will add support for Super V-Power and that'll lead to lagless hacks comfortably played everywhere :thumbsup:

slidelljohn

  • Sr. Member
  • ****
  • Posts: 250
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #10 on: January 10, 2019, 07:31:37 am »
Thanks ExL! I didn’t know what was going on with that link. ;D
The Super V-Power mod chip isn’t actually for helping with lag but a creative programmer could speed up some things with it.

I sent the new pcb design for the Super V-Power mod chip to the manufacturer but I’m going to have to do a few test before it’s ready to show off the new version of it. Hopefully within the next 2 months I’ll have something ready to show. The new design is gonna be crazy! :o
« Last Edit: January 10, 2019, 07:39:16 am by slidelljohn »

darkmoon2321

  • Jr. Member
  • **
  • Posts: 59
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #11 on: January 10, 2019, 02:23:35 pm »
Sorry about the link, never had that issue with dropbox before.  I did some work on optimizing the sprite routine:

https://www.mediafire.com/file/6ec2cavwn9vuvd3/Gradius3_spriteOptimization.ips/file

Apply this patch on top of your fastROM patch.  It reduces the amount of time spent computing individual sprite tiles by about 10% or so.  Further optimization is possible, but it would likely be at the expense of rom space and provide benefit mainly for large sprites containing lots of tiles.  In other words, creating separate routines for sprites that are vertically flipped, horizontally flipped, both, or neither.  This would eliminate the branch checking for these conditions that occurs for individual tiles right now.

slidelljohn

  • Sr. Member
  • ****
  • Posts: 250
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #12 on: January 10, 2019, 04:32:42 pm »
About 10%, that’s great!  :thumbsup:
I don’t have time to look at it today but I’ll definitely look at it tomorrow. I may have to put some of my other projects on hold and work on this one some more. I’ll look more into optimizing the asm code, expanding the rom, and remove all compression this weekend. Im also going to submit the complete disassembled assembly tomorrow.

I was able to test your patch right before I left to go to work and it looks like it’s around a .7 second speed up from when the ship first crashes if you let the game play on its own. So yea, definitely around a 10% speed up on top of the fastrom patch. Good work! I am interested in what you did to speed it up but I would probably hold off on the details just in case I can come up with something different to add to it.

Update 1-12-19:
I just now submitted 2 complete gradius 3 assembly documents and they can be seen in the Submission Queue Status Page. Hopefully within 24-48 hours they are accepted. The 2 gradius 3 assembly documents are for banks $00 and $02. Those 2 banks have the assembly for the whole game.  If these get accepted I may start uploading more assembly documents that I have for mega man x v1.1 and the 7th saga.
« Last Edit: January 12, 2019, 08:22:25 am by slidelljohn »

Aaendi

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #13 on: March 06, 2019, 01:20:15 pm »
Can I see the before and after of the original sprite ASM code, and the improved sprite ASM code?  I'm interested in this project.

I checked the "collision code" and it doesn't look like collision code to me.  It looks more like a general physics engine. It might have collision somewhere but I didn't find it.

edit:

Here's more optimizations to add to it:

Code: [Select]
arch snes.cpu

macro seek(n) {
origin (({n} & 0x7f0000) >> 1) | ({n} & 0x7fff)
base {n}
}
seek($8096d5)
-;
ldx $00
ldy $0000,x
inx
inx
stx $00
jsl $808ea0
lda $f6
beq +
sty $f2
+;
stz $02
ldx $00

-;
lda $0000,x
inx
and #$00ff
cmp #$00ff
beq label_0
cmp #$00fe
beq label_1
cmp #$00fd
beq label_2
ldy $f0
beq +
lda #$0000
+;
ora $04
stx $00
jsl $808ed8
ldx $00
lda $f6
beq -
stx $f4
lda $04
sta $f8
inc $f2
lda $0000,x
and #$00ff
cmp #$00ff
bne +
label_0:
stz $f0
stz $f6
+;
stx $00
jml $808ee3


label_2:
lda $0000,x
sta $05
inx
bra -

label_1:
stx $00
jsl $808ee3
bra --





seek($808f6e)
lda $0000,x
-;
lsr
lsr
bcc -
sta $0000,x


inx
inx
cpx #$3e20
beq +
-;
stz $0000,x
inx
inx
cpx #$3e20
bne -
+;

cpy #$3d80
bcs +
sep #$20
lda #$f0
-;
sta $0001,y
iny
iny
iny
iny
cpy #$3d80
bcc -
+;
rep #$20
lda $92
ora $66
ora $12f8
sta $18

-;
tya
lsr
and #$0006
tax
lda $8180ca,x
cmp $0002,y
beq +
sta $0002,y
tya
lsr
lsr
and #$001f
tax
sep #$20
lda $8180d2,x
xba
tya
asl
eor #$50
rep #$20
sta $0000,y
+;


lda $18
bne big_jump

tya
and #$001c
tax
lda $8180aa,x
and $1e
bne +
sep #$20
lda $0000,y
dec
bra ++
+;
sep #$20
lda $0000,y
+;
clc
adc $8180ac,x
sta $0000,y
rep #$20
big_jump:

iny
iny
iny
iny
cpy #$3e00
bcc -
sep #$20
lda #$01
pha
plb
rep #$20
rtl
« Last Edit: March 10, 2019, 09:18:37 am by Aaendi »

Samus12345

  • Jr. Member
  • **
  • Posts: 6
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #14 on: March 11, 2019, 11:52:23 pm »
Some info on darkmoon's sprite routine hack: It introduces glitchy graphics on the fire stage boss (the two-headed dragon) that isn't in the original fastrom hack.

Aaendi

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #15 on: March 12, 2019, 03:34:18 pm »
I noticed that this loop happens pretty often, and I wonder if there is any way to remove the loads and stores from $fc and just keep it in the X register.  I would optimize this code myself, but I found out that this is in the middle of Darkmoon's code.

Code: [Select]
8090a4 lda $fc       [0000fc] A:0000 X:0700 Y:3c18 S:1de9 D:0000 DB:7e nvmxdIZc V:165 H: 312
8090a6 clc                    A:0700 X:0700 Y:3c18 S:1de9 D:0000 DB:7e nvmxdIzc V:165 H: 340
8090a7 adc #$0040             A:0700 X:0700 Y:3c18 S:1de9 D:0000 DB:7e nvmxdIzc V:165 H: 352
8090aa tax                    A:0740 X:0700 Y:3c18 S:1de9 D:0000 DB:7e nvmxdIzc V:165 H: 370
8090ab cpx $18       [000018] A:0740 X:0740 Y:3c18 S:1de9 D:0000 DB:7e nvmxdIzc V:165 H: 382
8090ad bcc $9082     [809082] A:0740 X:0740 Y:3c18 S:1de9 D:0000 DB:7e NvmxdIzc V:165 H: 410
809082 stx $fc       [0000fc] A:0740 X:0740 Y:3c18 S:1de9 D:0000 DB:7e NvmxdIzc V:165 H: 428
809084 lda $00,x     [000740] A:0740 X:0740 Y:3c18 S:1de9 D:0000 DB:7e NvmxdIzc V:165 H: 456
809086 beq $90a4     [8090a4] A:0000 X:0740 Y:3c18 S:1de9 D:0000 DB:7e nvmxdIZc V:165 H: 490
8090a4 lda $fc       [0000fc] A:0000 X:0740 Y:3c18 S:1de9 D:0000 DB:7e nvmxdIZc V:165 H: 508
8090a6 clc                    A:0740 X:0740 Y:3c18 S:1de9 D:0000 DB:7e nvmxdIzc V:165 H: 536
8090a7 adc #$0040             A:0740 X:0740 Y:3c18 S:1de9 D:0000 DB:7e nvmxdIzc V:165 H: 588
8090aa tax                    A:0780 X:0740 Y:3c18 S:1de9 D:0000 DB:7e nvmxdIzc V:165 H: 606
8090ab cpx $18       [000018] A:0780 X:0780 Y:3c18 S:1de9 D:0000 DB:7e nvmxdIzc V:165 H: 618
8090ad bcc $9082     [809082] A:0780 X:0780 Y:3c18 S:1de9 D:0000 DB:7e NvmxdIzc V:165 H: 646

KingMike

  • Forum Moderator
  • Hero Member
  • *****
  • Posts: 6894
  • *sigh* A changed avatar. Big deal.
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #16 on: March 12, 2019, 07:24:05 pm »
Only the A register can be used for math instructions (such as adc).
"My watch says 30 chickens" Google, 2018

Aaendi

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #17 on: March 12, 2019, 11:55:43 pm »
I'm talking about the loading and storing.  It can just use TXA and TAX and not load and store from $FC until it's out of the loop.

slidelljohn

  • Sr. Member
  • ****
  • Posts: 250
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #18 on: March 14, 2019, 04:17:07 pm »
I still haven't done anything else with gradius 3 but I plan to start working on this project again
soon.

@darkmoon2321
Maybe it would be a good idea to post your asm code. If enough of us put our ideas together we could
probable optimize the code more.

I have been gone for a little while but I'm back with some good news on something that I have been working on for the past few weeks. devinacker who has been updating bsnes plus created a branch of the bsnes-plus-v04 that uses the 128kb of vram mod. The branch is called vramexpand https://github.com/devinacker/bsnes-plus/tree/vramexpand. The games appear to be working correctly with the extra vram but some of the debugging features don't work correctly for the extra vram. I decided to play around with the source code for the branch to get some of the debugging features working for the extra vram and I'm having good success in modifying the code. Not only do I have some of the extra vram data loading but I'm also adding new features. Here is a image showing the original tile viewer and the one I modified.


I added some new buttons and a height scroll box. I also changed how the tile viewer screen gets updated. It now updates almost instantly. Everything still doesn't work %100 in the tile viewer but vram 4bb settings are mostly correct.

I added a feature that lets you control what debugging windows are opened when you open bsnes plus.


You can ether choose what window to have open at start up or you can have it to where the last windows that were opened when you closed bsnes open the next time you open bsnes. I plain on adding more windows to the list and starting options for each window.

Now the modification that I made that took the most time is the new cpu tracer. I created a multi threaded, multi buffer near lag free cpu tracer with slightly new text format in trace files. The tracer allows for up to ~2gb trace files and it has a progress bar to see the size of the file being generated. The cpu step and breakpoints are also displayed differently in the console. I also added a clear console button.


The new algorithm to generate the trace file still has room for improvement. I should have the new algorithm finished soon.

Here is a link to the vramexpand and the modified vramexpand with the changes that I made.
http://www.mediafire.com/file/jve2buz9iqnzg6o/bsnes_plus_files.zip/file

This is how you change the vram settings.



64kb is what is in a original snes, 128kb is how much the snes was actually capable of using and 256kb uses a mod that lets you switch between 2 different sets of 128kb of vram. I haven't test the 256kb yet but I will soon.

As far as I can tell there are no major bugs or memory leaks from the modifications that I made. Nothing ever crashes. You do need at least 2 gb of ram for the tracer. I hope you all like the modifications that I made. I definitely wouldn't mind helping contribute to the official source that devinacker maintains.

darkmoon2321

  • Jr. Member
  • **
  • Posts: 59
    • View Profile
Re: slidelljohn (a.k.a.[J]) snes projects page
« Reply #19 on: March 16, 2019, 01:46:50 am »
@darkmoon2321
Maybe it would be a good idea to post your asm code. If enough of us put our ideas together we could
probable optimize the code more.

Here is a commented disassembly of the optimized code I did for sprite handling:

Code: [Select]
$80/9020 LDX #$0200 ;Initial address of object data
$80/9023 LDA #$08C0 ;final address
$80/9026 BRA $58    [$9080] ;process all objects between initial and final
$80/9028 LDA $000ED0
$80/902C CMP #$008F
$80/902F BEQ $11    [$9042]
$80/9031 LDA $0011D0
$80/9035 CMP #$0095
$80/9038 BEQ $34    [$906E]
$80/903A LDX #$08C0 ;Initial address of object data
$80/903D LDA #$1200 ;final address
$80/9040 BRA $3E    [$9080] ;process all objects between initial and final
$80/9042 LDX #$08C0 ;Initial address of object data
$80/9045 LDA #$0E40 ;final address
$80/9048 JSL $809080[$80:9080] ;process all objects between initial and final
$80/904C BCS $62    [$90B0] ;exit if no more room for sprites
$80/904E LDX #$0EC0 ;Initial address
$80/9051 LDA #$0F00 ;final address of object data
$80/9054 JSL $809080[$80:9080] ;process all objects between initial and final
$80/9058 BCS $56    [$90B0] ;exit if no more room for sprites
$80/905A LDX #$0E40 ;Initial address of object data
$80/905D LDA #$0EC0 ;final address
$80/9060 JSL $809080[$80:9080] ;process all objects between initial and final
$80/9064 BCS $4A    [$90B0] ;exit if no more room for sprites
$80/9066 LDX #$0F00 ;Initial address of object data
$80/9069 LDA #$1200 ;final address
$80/906C BRA $12    [$9080] ;process all objects between initial and final
$80/906E LDX #$0900 ;Initial address of object data
$80/9071 LDA #$1200 ;final address
$80/9074 JSL $809080[$80:9080] ;process all objects between initial and final
$80/9078 BCS $36    [$90B0] ;exit if no more room for sprites
$80/907A LDX #$08C0 ;Initial address of object data
$80/907D LDA #$0900 ;final address
$80/9080 STA $18 ;Store final address for object data (terminator)
$80/9082 STX $FC ;store address of object data
$80/9084 LDA $00,x ;Object word 0, offset to 03:0000 array that determines the sprite data
$80/9086 BEQ $1C    [$90A4]
$80/9088 STA $04 ;store the offset to the sprite data in ROM
$80/908A LDA $0A,x ;Object X position
$80/908C STA $08
$80/908E LDA $0E,x;Object Y position
$80/9090 STA $0A
$80/9092 LDA $02,x ;Object alternate attributes
$80/9094 STA $14 ;"Alternate attributes/palette" flag stored in top bit
$80/9096 ASL A
$80/9097 XBA
$80/9098 STA $13 ;store alternate attributes/palette
$80/909A LDA $04,x ;Object attributes
$80/909C STA $06
$80/909E LDX $04 ;Load relative offset of ROM sprite data into X
$80/90A0 BRA $0F    [$90B1]
$80/90A2 BCS $0C    [$90B0]
$80/90A4 LDA $FC ;Load address of object data
$80/90A6 CLC
$80/90A7 ADC #$0040 ;increment to the next object address
$80/90AA TAX
$80/90AB CPX $18 ;check for final address
$80/90AD BCC $D3    [$9082] ;loop until we reach the final address
$80/90AF CLC ;return carry clear, we still have room for more sprites
$80/90B0 RTL ;end sprite tile processing for the current set of objects
$80/90B1 LDA $030000,x ;get # of tiles in the sprite
$80/90B5 AND #$00FF
$80/90B8 STA $16 ;store num_tiles
$80/90BA INX
$80/90BB LDA $030002,x
$80/90BF XBA
$80/90C0 CMP #$FF00 ;check for FF value in last byte of sprite tile's data
$80/90C3 BCC $07    [$90CC]
$80/90C5 LDA $030000,x ;if last byte was FF, get new ROM offset for sprite data
$80/90C9 TAX
$80/90CA BRA $EF    [$90BB]
$80/90CC AND #$0010 ;check sprite size
$80/90CF BNE $07    [$90D8]
$80/90D1 STZ $0E ;store the sprite size toggle bit
$80/90D3 LDA #$FFFC ;load amount to offset the tile based on its size
$80/90D6 BRA $08    [$90E0]
$80/90D8 LDA #$0011
$80/90DB STA $0E ;store the sprite size toggle bit
$80/90DD LDA #$FFF8
$80/90E0 STA $0C ;store tile offset modifier based on sprite size
$80/90E2 STZ $11 ;Initialize upper byte of tile X position (16 bit) to zero
$80/90E4 LDA $02FFFF,x ;get tile's relative X position
$80/90E8 BPL $02    [$90EC]
$80/90EA DEC $11 ;if relative X is negative, set upper byte to FF
$80/90EC STA $10 ;store lower byte of relative X position
$80/90EE BIT $06 ;check object's attributes
$80/90F0 BVC $07    [$90F9] ;check if tile is horizontally flipped
$80/90F2 SEC
$80/90F3 LDA $08
$80/90F5 SBC $11 ;subtract tile relative X from object X if hflip
$80/90F7 BRA $05    [$90FE]
$80/90F9 CLC
$80/90FA LDA $08
$80/90FC ADC $11 ;if not flipped, add tile relative X to Object X
$80/90FE CMP #$0110 ;right boundary check
$80/9101 BMI $03
$80/9103 JMP $917B ;If we get here, do not draw tile
$80/9106 CMP #$FFF0 ;left boundary check
$80/9109 BPL $03    [$910E]
$80/910B JMP $917B  [$80:917B] ;If we get here, do not draw tile
$80/910E CLC
$80/910F ADC $0C ;Add relative offset based on sprite size toggle bit
$80/9111 STA $0000,y ;store final tile X to be transmitted to OAM
$80/9114 XBA
$80/9115 LSR A
$80/9116 ROR $00 ;rotate X most significant bit into extended OAM table temp variable
$80/9118 STZ $11 ;Initialize upper byte of tile Y position (16 bit) to zero
$80/911A LDA $030000,x ;get tile's relative Y position
$80/911E BPL $02    [$9122]
$80/9120 DEC $11 ;if relative Y is negative, set upper byte to FF
$80/9122 STA $10 ;store lower byte of relative Y position
$80/9124 BIT $06 ;check object's attributes
$80/9126 BPL $07    [$912F] ;check if tile is vertically flipped
$80/9128 SEC
$80/9129 LDA $0A
$80/912B SBC $11 ;if vflip, subtract relative Y from object Y
$80/912D BRA $05    [$9134]
$80/912F CLC
$80/9130 LDA $0A
$80/9132 ADC $11 ;if not flipped, add relative Y to object Y
$80/9134 BIT #$FF00 ;Make sure the upper byte of tile Y position is 0 at this point
$80/9137 BEQ $04    [$913D]
$80/9139 ASL $00 ;if upper byte of Y was not 0, pop the X MSB for this tile off the temp variable
$80/913B BRA $3E    [$917B] ;if we get here, do not draw tile
$80/913D SEC
$80/913E SBC #$0010 ;subtract 10 from tile Y position
$80/9141 CLC
$80/9142 ADC $0C ;add relative offset based on sprite size toggle bit
$80/9144 STA $0001,y ;Store final tile Y to be transmitted to OAM
$80/9147 LDA $030002,x ;get tile's relative attributes and tile #
$80/914B AND #$EFFF ;clear vflip flag?
$80/914E EOR $06 ;toggle bits from the object's attributes
$80/9150 BIT $14 ;check "alternate attributes/palette" flag
$80/9152 BPL $05    [$9159]
$80/9154 AND #$F1FF ;clear palette bits
$80/9157 ORA $13 ;set alternate attributes/palette
$80/9159 STA $0002,y ;store tile's final attributes and tile # to be transmitted to OAM
$80/915C INY ;update write offset for next tile's data
$80/915D INY
$80/915E INY
$80/915F INY
$80/9160 LSR $0E ;rotate the size toggle bit into our temp extended OAM variable
$80/9162 ROR $00
$80/9164 BCC $15    [$917B] ;check if we have filled our temp variable
$80/9166 CPY #$3E00 ;check if we have used up all available OAM space
$80/9169 BCS $1C    [$9187] ;if so, end sprite tile processing
$80/916B LDA $00 ;load our temp extended OAM variable
$80/916D STA ($02) ;indirect store the extended OAM table
$80/916F LDA $02
$80/9171 ADC #$0002 ;update the write address for extended OAM data
$80/9174 STA $02
$80/9176 LDA #$8000
$80/9179 STA $00 ;reset our temp extended OAM variable
$80/917B INX ;update the read offset for the next tile's data from ROM
$80/917C INX
$80/917D INX
$80/917E INX
$80/917F DEC $16 ;countdown until all tiles in the object have been processed
$80/9181 BEQ $03    [$9186]
$80/9183 JMP $90BB  [$80:90BB] ;loop over all tiles in the object
$80/9186 CLC
$80/9187 JMP $90A2  [$80:90A2] ;start on the next object

For the most part, it's pretty similar to the original, with a few exceptions.  In the original, the ROM location where tile data was located was loaded and stored into the X register a few times for every tile because the X register was re-used to hold the location of the extended OAM table in RAM.  Similarly, the address of the write location for tile data needed to be loaded for every tile because they also temporarily used the Y register to hold the value for the size-toggle dependent position offset.  They also checked for vertical/horizontal flips by doing:
Code: [Select]
LDA $06
BIT #$8000  ;#$4000 for hflip
BEQ $07    [$9135]

I simplified this using BIT $06 and either BVC or BPL.  There were several other minor modifications as well.  As for speeding up the math with $FC by using TAX/TXA, it's doable, but the value in the X register will need to be restored to the address of the object's data as soon as the object's sprites are finished being drawn.  In other words, this will only save processing time for the case where there is no active object in a slot.

Some info on darkmoon's sprite routine hack: It introduces glitchy graphics on the fire stage boss (the two-headed dragon) that isn't in the original fastrom hack.

I didn't know this.  I'll have to take a look and see if I can find out what's happening.  I must have missed something.
Edit: I'm not convinced this is from me.  I just ran both versions up to this boss using an invincibility cheat to test, and both showed some pretty glitchy graphics.  I think there are just too many overlapping sprites trying to be displayed at once.  The way the boss curls around on itself makes it particularly prone to glitchy sprites.  However, I did find a potential bug.  At $80/9166, it should be changed to:

Code: [Select]
LDA $00
STA ($02)
CPY #$3E00
BCS $18

In other words, extended OAM data needs to be saved prior to checking whether there are too many sprites on-screen. Otherwise the extended OAM data (size toggle, X MSB) might not be saved for the last few sprite tiles in the event that there are too many sprites on-screen at once.
« Last Edit: March 16, 2019, 05:38:09 pm by darkmoon2321 »