News:

11 March 2016 - Forum Rules

Main Menu

SNES Audio Streaming code

Started by gauveldt, April 01, 2014, 10:07:47 AM

Previous topic - Next topic

Bregalad

QuoteHow does a PAL SNES change in the SPC?
Are the timers or pitch-to-sample-rate any different?
As far I know the response to those 2 questions are respectively : Nothing, No.

However, the VBlank runs at 50Hz and the CPU runs at a different pace. This effectively change the CPU Cycle / SPC cycle ratio, which itself varies slightly between consoles of the same region.

gauveldt

#21
Quote from: Bregalad on April 13, 2014, 02:44:08 PM
As far I know the response to those 2 questions are respectively : Nothing, No.

However, the VBlank runs at 50Hz and the CPU runs at a different pace. This effectively change the CPU Cycle / SPC cycle ratio, which itself varies slightly between consoles of the same region.
Actually I traced through the transfer slice routine once in a debugger recently and even my highly optimized loop in the SPC to accept data from the SNES had lots of spinning (12-13 times per pass leading to 2664-2886 spins roughly by the time an entire slice transfers some of thrse spins being burned every 32k when the ROM crosses a bank boundary to advance the bank) through - cmp $40 : bne - ;(D=2100) so I think a PAL SNES would just burn a few of those up first before the routine were to actually slow down.

one pass of writes looks like this now on the SPC side:

                    -   -- cmp x,$f7 : bne --       ; wait for nonce
                        ; *-*-*-*
                        mov  a,$f4                  ; get first byte
                        mov  (bufWrPtr)+y,a : inc y ; store 1st byte
                        mov   a,$f5
                        mov  (bufWrPtr)+y,a : inc y ; store 2nd byte
                        mov   a,$f6
                        mov  (bufWrPtr)+y,a         ; store 3rd byte
                        mov   $f4,x                 ; acknowledge (ASAP)
                        ; -*-*-*-
                        inc   y
                        inc   x                     ; ready for next write
                        cmp   x,#86                 ; check for done (255 bytes is max we can index)
                    bne -

12 spins through - cmp $40 : bne - occur on SNES side between *-*-*-* and -*-*-*-
I unroll this to advance bufWrPtr+1 at the passes where Y wraps from $ff to $00.  I also unroll it to do an or on the brr header (x pass 221/223) so I can conditionally set the value before the first pass so it may be set to 3 (otherwise 0) when the slice is going to write the last brr in the ring buffer (the 3 is to turn on LOOP+END in the brr header byte).  There's a bit of time at the end since the SPC's is also spinning a few times at -   -- cmp x,$f7 : bne --       ; wait for nonce every pass.

I might be able to use a couple of slave BufWrPtr's (offset by 1 and 2 from the master) to move the two inc y instructions outside the *-*-*-* and -*-*-*- part.

If I ever make an SPC driver intended to do the audio slice transfer operation via HDMA to 2140-2143 a PAL SNES would require different cycle timing in the unrolled loop doing the transfer.

All those spins are bothering me though since another vblank is occuring mid transfer and it was intended to stay in the time between two vblanks.

That SPC loop needs to speed up greatly or it has no chance of keeping up to HDMA.  I think it will need to be fully unrolled in that case.

April 15, 2014, 03:53:25 AM - (Auto Merged - Double Posts are not allowed before 7 days.)

Source to SNES Audio streaming code 2014.04.19 (updated 2014.04.19 19:30 PST) -- moved to github 2015.08.13

There it is for anyone that wants to play around with it or try it on real hardware.
This version includes a funky VU bar that won't work in ZSNES (ZSNES reports DSP OUTX registers as 00).

I don't have real hardware to test this on so I've coded the timer target's to keep in sync with BSnes/Higan which I believe is the more accurate of the emulators I test with.  (It assumes target=4 makes stage 2 count as 0,1,2,3,*0,1,2,3,*0,1...  where * are advances of the 4-bit stage 3 counter.  This assumption causes overflow in Geiger9X and underflow in ZSNES).  If someone tests this on real hardware, please report the result and hopefully prove Anomie's APU docs correct.

1. Stick the audio (INCBIN) into the ROM image from file offset $008000.
2. Pad the brr audio file to a multiple of 9*32768 (9 banks, $048000).
3. Set AUDBANK_HI appropriately: A 9-bank brr would be AUDBANK_LO=$01, AUDBANK_HI=$0A

The source uses the SA-1 MMC registers to bank the first 1-2MB of ROM to 00-3f, 3-4 of RAM to 80-bf and 5-8MB to c0-ff.
The audio code will play into the $c0 region but I have some issue trying to loop back correctly to AUDBANK_LO (the issue is with getting a proper multiple of 9 so brr chunks aren't mismatched).  Someone who fixes this please let me know :)  Thanks.

I've tried to unroll the SPC code a bit to make the SNES spin less in the transfer code but it's still doing 10-11 spins per pass during transfer (got it down from 12-13 tho).  In theory it'll spin less in PAL since the S-CPU is running slower relative to the SPC's code.

Assembled with Asar.
Sample build batchfile included.

Known issue:
A utility like IpsAndSum may be needed to correct the ROM header checksum.

I haven't included any audio files due to the same taboos as asking for ROMs in the forums. :P

Bregalad

#22
I managed to compile and test it on emulation, but my PowerPak doesn't like the ROM, it complains about the lack of a header. However I think the problem is that you specified usage of the SA-1 chip in the header, which makes it incompatible with the Power Pak.

Can't you make a version without the SA-1 chip, so I can test that on my Power Pak ?

EDIT

Ok I changed the header on my own, so it is $20, $00, $0b, $07, $01, $01, .... (simple LoROM, ROM only cartridge)

However the Power Pak still complains that it doesn't find the header at $7fc0 when it's here obviously... I don't understand :(

gauveldt

#23
Quote from: Bregalad on April 17, 2014, 05:35:07 PM
I managed to compile and test it on emulation, but my PowerPak doesn't like the ROM, it complains about the lack of a header. However I think the problem is that you specified usage of the SA-1 chip in the header, which makes it incompatible with the Power Pak.

Can't you make a version without the SA-1 chip, so I can test that on my Power Pak ?

EDIT

Ok I changed the header on my own, so it is $20, $00, $0b, $07, $01, $01, .... (simple LoROM, ROM only cartridge)

However the Power Pak still complains that it doesn't find the header at $7fc0 when it's here obviously... I don't understand :(

It's weird that Geiger9X can detect the SA-1 Super Mario World ROMhack but fails on my ROM with the relevant headerbytes at ROM file offset 00:7fc0 all match up in my hex editor (except the ROM title).  Geiger9X doesn't even report the title correctly.  Just garbage.  Game-specific hacks in Geiger9X/SNES9X chekcing against the ROM titles?

I have no idea what's going on there.  The header detects fine in ZSNES and Higan but SNES9X/Geiger9X reports garbage.
org $00ffc0
db "Anime Mayhem RPG     "          ; 21-byte game title
db $23                              ; Use SA-1
db $35                              ; SA-1+ROM+RAM+BATTERY
db $0D                              ; ROM SIZE > 32 Mbit
db $07                              ; 128 KB BW-RAM
db $01                              ; NTSC
db $01                              ; hk
db $00                              ; Rom Version

The bytes $23,$35 (and $0d,$07 as well to limit ROM to 4MB and 32KB SRAM) immediately after the game title are the two that need to change to make it a non-SA-1 ROM.  Also needing to be commented out is the "sa1rom 0,1,2,3" mapping directive pseudo-op around line 4 (probaby needs to be changed to 'lorom' mapper directive instead).

You should probably also comment out the bank setup code around line 166:

    ; init 8MB MMC banking
    ; hirom hole takes rom4-rom7
    ; bit 7 is off meaning lorom hole
    ; has rom0-rom3
;    %MC8XY8()
;    ldx #$00
;    lda #$04
;    -
;        sta $2220,X
;        inc a
;        inx
;        cpx #$04
;    bne -
    %MC8XY16()

For a LoROM-only game you might need to truncate the ROM to 4MB.
I wonder if the powerpak requires the optional 'licensee' header that appears just before the $ffc0 main ROM header.

Edit: not to comment out Register size change at end of bank init code

Bregalad

To be honnest I don't know the problem.
I made the header to be Lo-ROM, ROM only, and 2 Megabytes (16 megabits). Barely more than 1 MB is needed for a 1 minute song.

I am not very knonwledgable about SNES headers unfortunately.

My header at $7fd4 contains the following :

0x20, 0x20, 0x00, 0x0b, 0x07, 0x01, 0x01, 0x00, 0x68, 0x8f, 0x97, 0x70

I think that means Lo-ROM, ROM only, 2 Megabytes, but I'm not 100% sure. At least it seems the powerpak doesn't like it.

gauveldt

#25
Quote from: Bregalad on April 18, 2014, 04:37:11 PM
To be honnest I don't know the problem.
I made the header to be Lo-ROM, ROM only, and 2 Megabytes (16 megabits). Barely more than 1 MB is needed for a 1 minute song.

I am not very knonwledgable about SNES headers unfortunately.

My header at $7fd4 contains the following :

0x20, 0x20, 0x00, 0x0b, 0x07, 0x01, 0x01, 0x00, 0x68, 0x8f, 0x97, 0x70

I think that means Lo-ROM, ROM only, 2 Megabytes, but I'm not 100% sure. At least it seems the powerpak doesn't like it.

Seems like an odd ROM size might confuse super powerpak.

You might try padding the rom to the exact size of the header (so add 0's in a hex editor to the end until the file size is an exact 2097152).  Once padded Asar will leave the ROM size as is (unless you add data past the end of the current ROM's data).

This is a pretty detailed guide on setting up the FFC0 header

Edit: I made the ROM size match 4MB as I set in the header and it started to recognize properly in SNES9X/Geiger9X.

Bregalad

#26
Looks like filesize was the problem, I managed to pad the file to 2 MB and the powerpak now loads fine.

Unfortunately, the demo doesn't work quite well on real hardware.
First some garbage plays, and the very begining of the song (the first half-second) plays twice, then it goes fine until 0:14 where suddently some garbage plays and then a part of the song is played twice again, then the same phenomenon appears at 0:28, etc... (every 14 seconds)

Note that I used the timer displayed on screen for indication, not a real stopwatch, so these timings might be off (especially considering I have a PAL SNES).

EDIT : Actually my SNES behaved exactly like BSNES in PAL mode does. So at least it seems you can, to some extent, trust BSNES for this one. Since BSNES in NTSC emulates it fine, there is chance that adapting timings to PAL is all that's needed to make it work on my PAL SNES.

gauveldt

#27
Quote from: Bregalad on April 19, 2014, 06:10:19 AM
Unfortunately, the demo doesn't work quite well on real hardware.
First some garbage plays, and the very begining of the song (the first half-second) plays twice, then it goes fine until 0:14 where suddently some garbage plays and then a part of the song is played twice again, then the same phenomenon appears at 0:28, etc... (every 14 seconds)

Note that I used the timer displayed on screen for indication, not a real stopwatch, so these timings might be off (especially considering I have a PAL SNES).

That's an underrun condition and every 14 seconds is rather often meaning it would appear a PAL 5A22 doesn't execute fast enough to send enough slices to keep up with a 32 kHz sample rate.  An underrun or overrun due to timer fencepost has a period of over a few minutes of playback.  Further evidence that it is underrunning pretty much right out the start gate.  Good indication the PAL 5A22 simply isn't running fast enough to keep up.  The underrun is basically the audio DSP lapping the write thread and the garbage is heard when the writer is scribbling onto the same slice being loaded into the DSP for playback.


   bbaaaa 4041  ffffffffff pppp
cccccccc 4243 hhhhh:mm:ss tttt

         vvvvvvvvvvvvvvvv

bbaaaa : bk:addr in rom fetching samples (hex)
cccccccc : audio slice count
4041 : 2040 2041 (hex)
4243 : 2042 2043 (hex)
ffffffffff : elapsed vblanks (decimal)
pppp : vblank "seconds" (ticks every 60 vblanks so on PAL should tick slower than tttt)
tttt : SPC hardware timer seconds (should be same both systems)
hhhh:mm:ss (hours:mins:secs) uses the SPC driver's seconds counter

2140 shows final index echo from the last slice transfer ($DE=222)
2141 is a seconds counter
2142 outputs voice wave height
2143 outputs "buffer slices available" value (==0 SPC buffer full)

Any 0's showing up on the 2143 port display? (It's the second group of numbers from the left).  May have to drop to 16 kHz on PAL.

2143 goes to zero whenever the SPC has enough data and the SNES can do other things that frame.  If it's never zero it's an indication the SNES processor doesn't have enough time to feed the SPC enough data to keep up with the output.

Looks like 16 kHz audio is needed to allow the code to run lowest-common-denominator 5A22 speed.
I'll make those changes after work today.

Edit:
Work's over and here's the updated sources.
Source to SNES Audio streaming code 2014.04.19 (updated 2014.04.19 19:30 PST)

The new source uses 16 kHz and thus requires the appended audio .brr to be encoded for 16 kHz.  On a plus it'll mean files will only be half the size as 32 kHz so longer audio within the same sized ROM.

The new source has a fully functioning easter egg.
Hint: The code will read the brr data from a source other than the cartridge ROM if the appropriate hardware is detected allowing you to play a brr of any size (well technically there's a limit but it's huge).

Bregalad

#28
I don't know what's happening, but it still plays data at 32kHz. Maybe it's I who did something wrong ?

April 20, 2014, 10:23:40 AM - (Auto Merged - Double Posts are not allowed before 7 days.)

I tried it on hardware, and it seems to work like a charm. (I tried a 3:00 minute song, should I try a longer one ?)

gauveldt

Quote

April 20, 2014, 10:23:40 AM - (Auto Merged - Double Posts are not allowed before 7 days.)

I tried it on hardware, and it seems to work like a charm. (I tried a 3:00 minute song, should I try a longer one ?)
Good to have it working on real hardware. :)  Thanks.

32 kHz was underrunning in ZSNES with about a 3-5 minute period so it might take 6-10 minutes to know for sure there's no fence post issue with the timer at 16 kHz (slower so any timer fencepost off-by-one will creep along the buffer more slowly before the overrun or underrun is noticeable).  The SPC voice and buffer are not reset when audio loops so letting a 3 minute audio loop a few times would be sufficient to get a burn in test of the audio streaming logic (I.E. the timer used to keep buffer writes in sync to playback).

At this point it would seem bsnes/Higan and Anomie have it right with the SPC timers to the real thing and that ZSNES and Geiger both have off-by-one in their handling of the timer target (takes 1/8000 longer in ZSNES leading to underrun, 1/8000 shorter in SNES9x/Geiger9X leading to overrun).

Bregalad

Well, you'll have to tell me if your program supports song looping, and if so, how, and then I could try something for more than 10 minutes and confirm that everything runs fine.

gauveldt

#31
Quote from: Bregalad on April 20, 2014, 04:24:24 PM
Well, you'll have to tell me if your program supports song looping, and if so, how, and then I could try something for more than 10 minutes and confirm that everything runs fine.
Looping works in the program so long as the last brr chunk of the audio is modified to clear LOOP+END in the brr chunk header then padded at the end with $00's as necessary to make the audio filesize a multiple of $48000 (9 32KB banks).

For a file size of 9 banks placed in the bank immediately following the ASM (01:8000) set AUDBANK_LO to $01 and AUDBANK_HI to $0a (9+1).  Any other size it would be 1+(brr_file_length/32768), adding $40 if 2MB<filesize<=4MB.  SLICE_LEN is not needed for streaming directly from ROM and may be ignored (it is used only when the Easter egg code is active).incsrc "src/macros.asm"

AUDBANK_LO = $01
;AUDBANK_HI = $b6   ; for candy.brr
;AUDBANK_HI = $92   ; for top-dream.brr
AUDBANK_HI = $40   ; for yururenai.brr
;AUDBANK_HI = $cd   ; for hurricane.brr

SLICE_LEN = $00000ee0   ; slice count of hurricane.brr
.
.
.
You'll see I have values in there to loop some of my own brr's I used to test.

In a hex editor go into the brr file 9 bytes from the end (for a file of size $48000 this would at be offset $47ff7) and change the value to value&0xfc (clear LOOP+END bits of the final brr chunk in the file).  If END is set the AUDIO won't loop and LOOP should also not be set in the brr file itself (since LOOP is used in the last brr of the buffer to make it circular for constant streaming).

Bregalad

OK, seems to work fine for >12 minutes.
Also, I didn't get it to loop PERFECTLY but I don't think that's an issue right now, as you're more for a proof-of-concept.

By the way (it's not really related) but Tales of Phantasia's audio works very poorly on my PAL console, the code was obviously made specific to NTSC consoles.

gauveldt

Quote from: Bregalad on April 21, 2014, 06:32:52 AM
OK, seems to work fine for >12 minutes.
Also, I didn't get it to loop PERFECTLY but I don't think that's an issue right now, as you're more for a proof-of-concept.

By the way (it's not really related) but Tales of Phantasia's audio works very poorly on my PAL console, the code was obviously made specific to NTSC consoles.
So I think it can be mostly confirmed that it works on real hardware and that bsnes/higan (Anomie's SPC emulation) have it right.

It makes sense that ToP doesn't run well on PAL since it was never released outside Japan and Japan is NTSC thus ToP was never production-tested in a non-NTSC environment.  The audio issues in ToP on PAL are again most likely to the 5A22 running slower and not having quite enough processor power to keep up with the playback rate among everything else the 5A22 is doing such as screen graphics, player movement, battles scenes, sprites, DMA and HDMA, etc.

Bregalad

Now I don't see why you don't support arbitrairly sample rates and loop points (given that loops are an integer number of BRR blocks of course).
You'd also want to make the sample buffer in SPC RAM smaller, so that this can run with something else loaded at the same time without eating most of the RAM.

gauveldt

Quote from: Bregalad on April 22, 2014, 03:52:15 AM
Now I don't see why you don't support arbitrairly sample rates and loop points (given that loops are an integer number of BRR blocks of course).
You'd also want to make the sample buffer in SPC RAM smaller, so that this can run with something else loaded at the same time without eating most of the RAM.
The sample rate changes the timer target, buffer size, slice size and loop unrolling of the transfer code in ways that all need to be LCD 9,16,slice_size making it less than trivial to use any sample rate not divisible by 16.  The large bank size for looping is due to the fact that it takes 9 banks of 32768 bytes of data to be aligned to a brr chunk and looping when it's not causes the data to be misaligned, play garbage and eventually stop altogether when audio nybbles start being read where brr chunk headers are expected.

The audio buffer could indeed be made smaller, as few as three or four slices at the absolute minimum.  A smaller buffer on the other hand will have the tradeoff of using more of the slowest pass (final brr chunk in final slice at end of buffer space has an extra OR opepration to force LOOP+END set in the brr header) through the unrolled transfer loop code.

If you find the easter egg it's looping is slice granular rather than bank granular due to not having to transfer from ROM with the possibility of bank changes during transfer while using an optimized transfer loop using as few cycles as possible.  Complications such as bank changes or finer looping control may further slow down the SNES-side transfer loop further lowering the maximum sample rate the routine can keep up with (it is already unable just from the PAL slowdown to keep up with 32 kHz).

furrykef

The tinyupload URL in the original post no longer works, and in any case something like this really should be uploaded to something like github.

Does anyone have a copy of the 2014.04.19 version of the code (or a later version, if one exists)?

gauveldt

Quote from: furrykef on July 14, 2015, 09:56:13 AM
The tinyupload URL in the original post no longer works, and in any case something like this really should be uploaded to something like github.

Does anyone have a copy of the 2014.04.19 version of the code (or a later version, if one exists)?

I am unable to repost the file until I'm back home at my PC.  Using my phone to reply while away to visit Dad while in hospital.  Should be sometime on or after 21st after I return home.

furrykef

OK. I could help you host it on github, or even host it myself (with due credit of course).

gauveldt

Quote from: furrykef on July 15, 2015, 10:28:54 AM
OK. I could help you host it on github, or even host it myself (with due credit of course).

I'm going to upload the ZIP to a new tinyupload link, hopefully enough it'll stay alive there long enough for you to download, unzip, stick up onto github and maybe post back a github link to this thread?

Uploaded 2015.07.30: http://s000.tinyupload.com/index.php?file_id=09288761622779528006