News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: SNES Audio Streaming code  (Read 15995 times)

gauveldt

  • Jr. Member
  • **
  • Posts: 47
    • View Profile
SNES Audio Streaming code
« on: April 01, 2014, 10:07:47 am »
Source to SNES Audio streaming code 2014.04.19 (updated 2014.04.19 19:30 PST)
Edit: Now on github

Edit: The more recent uploaded sources (on tinyupload github) use an SPC driver for accurate timing and buffer sync.  The hack to stream audio without a driver using only the IPL bootloader remains here in the OP as historical trivia.

Here's a very clever hack to do audio streaming on the SNES and it does without a single line of SPC-700 ASM, in other words, it badly abuses the IPL bootloader. :P

Just as a way to try this quick and dirty over a night of insomnia I used the reference article about accessing SPC-700 DSP with no SPC-700 code.  The routines basically use the nonce+=$22 trick (simplifies the transfer code).

It works in SNES9X (Geiger debugger), bsnes/higan (I use higan on my hi-end machine) and ZSNES and probably on the real thing.

The SPC-700 RAM is set up fairly simply
I store a small sample direactory at $200 which points to a 2 second audio buffer from $1688-$7fff (for an exercise you could even use the area above $8000 to turn on echo which needs additional RAM).

I clear this buffer to zero except for the very last chunk's header which is set to $03 (which enables it as a repeating ring buffer when the voice is keyed on).

I set the voice to use 24 kHz (not the full 32 kHz).  The reasoning behind it is due to the fact that 24000 divides evenly by 16 and by 60 (60 is frame rate, 16 is samples per chunk) leaving a mere 225 bytes, or 25 brr chunks to fill in per frame.

next required step is to initialize your buffer pointer.
Code: [Select]
ldx #$1688
stx swAUDBUF    ; anywhere in your direct page will work

the buffer starts at $1688 in SPC RAM and you want to initialize a counter that will start at the next frame at SPC address $1769 ($1688+225).  This point will count up to the final frame of the buffer then reset to the first.

Your NMI routine should add 225 to the buffer pointer every frame and wrap it to $1688 when it counts to $8000 (it should never be seen >$8000 outside of vblank)
Code: [Select]
onNMI:
    phb : phd
    rep #$38 : pha : phx : phy

    ; updates audio position
    ; to keep pointer 1 frame ahead of
    ; audio as it plays
    lda swAUDBUF
    clc
    adc.w #$00e1 ; 225 bytes per frame (25 brr chunks at 24 kHz)
    cmp #$8000
    bcc +
        lda #$1688
    +
    sta swAUDBUF

    %MC8()
    lda $4210

    rep #$38 : ply : plx : pla
    pld : plb
    rti                             ; Done with interrupt

the not so tricky tricky part is to initialize this so that the DSP is chasing your buffer as you fill it
to do this you want to
1. make sure you voice is set up but not keyed on
Code: [Select]
resetVox:
    %setdsp($6c,$20) ; unmuted, echo off
    %setdsp($4c,$00) ; clear key-on all voices
    %setdsp($5c,$ff) ; key-off (mute) all voices
    %setdsp($5d,$02) ; set sample directory to $0200

    %setdsp($00,$7f) ; V0 L VOL=max
    %setdsp($01,$7f) ; V0 R VOL=max
    %setdsp($02,$b8) ; PITCH L
    %setdsp($03,$0b) ;   ... H -> $0bb8 (24 kHz)
    %setdsp($04,$00) ; Sample #0
    %setdsp($05,$11) ; A=0 D=0
    %setdsp($06,$11) ; S=0 R=0
    %setdsp($07,$cf) ; GAIN=$1F (ignore ADSR)

    %setdsp($5c,$00) ; key-off (mute) all voices
    %setdsp($3d,$00) ; noise off all voices
    %setdsp($4d,$00) ; disable echo all voices
    %setdsp($0c,$7f) ; master vol L max
    %setdsp($1c,$7f) ; master vol R max
    %setdsp($2c,$00) ; echo vol L off
    %setdsp($3c,$00) ; echo vol R off

    rts
2. activate vblank (yes you need it)
Code: [Select]
    ; turn on NMI
    -   lda $4210
        bpl -
    lda #$80
    sta $4200
3. do a WAI then immediately set the buffer pointer to $1769 and send the DSP the key-on for the buffer's voice
Code: [Select]
    ; sync audio buffer pointer
    jsr resetVox                    ; reset DSP settings
    wai                             ; ensure following instructions are well away from NMI
    ldx #$1688+$e1 : stx swAUDBUF   ; reset audio buffer (one frame past start)
    %setdsp($4c,$01)                ; Voice 0 on (from audio buffer start)

With the voice activated your asm code uses the NMI's buffer pointer as the upload address to spc_begin_upload and then reads the next 225 bytes of audio and stuffs them one byte at a time into the SPC using spc_upload_byte

Code: [Select]
    ; stream audio (may occur outside of vblank)
    AUDBANK_HI = 2+(9*6)   ; my audio banks start at 02:8000-02:ffff though to 37:8000-37:ffff (32 KB x 54)
    ldx.w #$0000
    lda #$02    ; start bank of sample
    sta dbAUDBANK
    pha
    plb
    -   ldy swAUDBUF
        cpy #$7f1f
        bne +
            ; this is last frame of the buffer before
            ; wrap-around so force headerbyte
            ; to have loop+end flags
            jsr spc_begin_upload
            --  lda.w Sample_brr,x
                cpy.w #$00d8
                bne ++
                    ora #$03
                ++
                jsr spc_upload_byte
                inx
                cpx #$8000
                bcc ++
                    ldx.w #$0000
                    inc dbAUDBANK
                    lda dbAUDBANK
                    cmp.b #AUDBANK_HI
                    bcc +++
                        lda #$02
                        sta dbAUDBANK
                    +++
                    pha
                    plb
                ++
                cpy.w #$00e1
            bne --
            bra +++
        +
        jsr spc_begin_upload
        --  lda.w Sample_brr,x
            jsr spc_upload_byte
            inx
            cpx #$8000
            bcc +
                ldx.w #$0000
                inc dbAUDBANK
                lda dbAUDBANK
                cmp.b #AUDBANK_HI
                bcc ++
                    lda #$02
                    sta dbAUDBANK
                ++
                pha
                plb
            +
            cpy.w #$00e1
        bne --
        +++
        wai
    jmp -                       ; Make 65C816 Dizzy

the trick here is that the last buffer chunk must always have the loop and end flags set to make the buffer cirular (ora #$03) so you need to check that you are writing the buffer's end chunk and do the ora #$03 for the header byte when such is the case.

once done you WAI for the next frame and rinse, lather, repeat

hopefully I got all the necessary code together here:

spcutil.asm:
Code: [Select]
; High-level interface to SPC-700 bootloader
;
; 1. Call spc_wait_boot
; 2. To upload data:
;       A. Call spc_begin_upload
;       B. Call spc_upload_byte any number of times
;       C. Go back to A to upload to different addr
; 3. To begin execution, call spc_execute
;
; Have your SPC code jump to $FFC0 to re-run bootloader.
; Be sure to call spc_wait_boot after that.


; Waits for SPC to finish booting. Call before first
; using SPC or after bootrom has been re-run.
; Preserved: X, Y
spc_wait_boot:
    lda #$AA
    -   cmp $2140
        bne -

    ; Clear in case it already has $CC in it
    ; (this actually occurred in testing)
    sta $2140

    lda #$BB
    -   cmp $2141
        bne -

    rts


; Starts upload to SPC addr Y and sets Y to
; 0 for use as index with spc_upload_byte.
; Preserved: X
spc_begin_upload:
    sty $2142

    ; Send command
    lda $2140
    clc
    adc #$22
    bne +       ; special case fully verified
        inc
    +
    sta $2141
    sta $2140

    ; Wait for acknowledgement
    -   cmp $2140
        bne -

    ; Initialize index
    ldy.w #0

    rts


; Uploads byte A to SPC and increments Y. The low byte
; of Y must not changed between calls.
; Preserved: X
spc_upload_byte:
    sta $2141

    ; Signal that it's ready
    tya
    sta $2140
    iny

    ; Wait for acknowledgement
    -   cmp $2140
        bne -

    rts


; Starts executing at SPC addr Y
; Preserved: X, Y
spc_execute:
    sty $2142

    stz $2141

    lda $2140
    clc
    adc #$22
    sta $2140

    ; Wait for acknowledgement
    -   cmp $2140
        bne -

    rts

;
; Writes high byte of X to SPC-700 DSP register in low byte of X
;
write_dsp:
    phx
    ; Just do a two-byte upload to $00F2-$00F3, so we
    ; set the DSP address, then write the byte into that.
    ldy.w #$00F2
    jsr spc_begin_upload
    pla
    jsr spc_upload_byte     ; low byte of X to $F2
    pla
    jsr spc_upload_byte     ; high byte of X to $F3
    rts

macro setdsp(reg,val)
    ldy.w #$00F2
    jsr spc_begin_upload
    lda #<reg>
    jsr spc_upload_byte
    lda #<val>
    jsr spc_upload_byte
endmacro

To play something find a suitable WAV somewhere, convert the thing to 16-bit mono 24 kHz then use BRRTool to make a (much larger than SPC) .brr file form it.  Asar doesn't like corssing banks so you need to use the file slice option of INCBIN for every 32k bank.  It's a ton of copy-and-paste to get the file into Asar particularly because it doesn't accept math in the INCBIN slice notation.
INCBIN "file" : start - end

Also I made my banks a multiple of nine so that looping back to the first bank doesn't mess the audio up due to a framing error (misaligning headerbytes on repeat).

This is quite the hack the way it abuses the IPL uploader :P  and it's not perfect (it will drop out a few frames every minute or so).  However it is a very quick and dirty audio streaming code using only 65816 ASM.

April 02, 2014, 10:15:44 am - (Auto Merged - Double Posts are not allowed before 7 days.)
PS: I need to fix my SA-1 bank setup code.  I can't seem to access 80-BF or C0-FF banks in my ROM.  I've tried to set the registers to bank first two MB at 00-3f, second two MB at 80-bf and the last 4 MB into CO-FF which the docs say is possible but at least in ZSNES (byuu's version) this doesn't seem to be working.  I'll post what I'm using later when I'm home.  The ROM size matters here since streaming audio gets large in a ROM rather quickly unless I limit myself to bsnes/Higan and use MSU1.

April 02, 2014, 06:45:36 pm - (Auto Merged - Double Posts are not allowed before 7 days.)
PSS: pitch needs to be 0bf5 (23414 1/16 Hz) rather than 0c00 (24Khz) for the vblank (which is somewhere more like 58.535... Hz rather than exactly 60 Hz) to keep up.
Edit: 0bf5, not 0bb5.

April 03, 2014, 08:56:53 pm - (Auto Merged - Double Posts are not allowed before 7 days.)
PSSS: The frame dropping after a minute was due to the posted code lowering the sample rate too much (I tweaked it to slow down a particular sample making it too low and the buffer write was wrapping around and eventually scribbling into the playback causing the dropout).  The sample rate is very finicky and it needs to match as close as possible to the SNES's buffer writing (the precise vblank frequency) to avoid this issue.  It would be helpful if the SNES vblank was actually an exact 60 Hz but alas it is not (simply not the case for colour NTSC).  If it is too low the SNES eventually scribbles into the playback (buffer overrun) but if the sample rate is too fast the SNES won't keep up and the buffer underruns.  My testing in higan accuracy profile shows a difference of one in pitch will transition from eventual overrun to eventual underrun (in other words you can't get it exact) but it takes several minutes to reach that state.  I think I need to modify the transfer cycle to use a counter to count the fraction between write rate and playback rate then drop a frame whenever it gets off by a whole frame.  Once I find the right increment the buffer will stay in sync for hours rather than minutes making it suitable for continuous use (like an entire game).
« Last Edit: August 13, 2015, 12:31:46 pm by gauveldt »

Bregalad

  • Hero Member
  • *****
  • Posts: 2747
    • View Profile
Re: SNES Audio Streaming code
« Reply #1 on: April 04, 2014, 02:29:35 am »
The SPC and the main CPU are clocked by completely separate cristals in the SNES, so there is no kind of synchronization you'll be able to make implicitely - whatever you do it'll eventually run out of synchronization at one point.

You'll have to resort to explicit synchronization, that is, having the SPC use it's timer and tell the SNES CPU how many cycles have passed, so that it can send the right amount of data.

I doubt this is possible using only the IPL transmission protocol though, although I understand the challenge behind using exclusively that.

Quote
To play something find a suitable WAV somewhere, convert the thing to 16-bit mono 24 kHz then use BRRTool to make a (much larger than SPC) .brr file form it
Just for info, BRRTool suports resampling (now with anti-aliasing) and stereo-to-mono converting, so you don't need to convert it before (although if you want to double-check what the result will be before encoding, that's still an option)
« Last Edit: April 04, 2014, 02:56:54 am by Bregalad »

gauveldt

  • Jr. Member
  • **
  • Posts: 47
    • View Profile
Re: SNES Audio Streaming code
« Reply #2 on: April 04, 2014, 11:33:09 am »
The SPC and the main CPU are clocked by completely separate cristals in the SNES, so there is no kind of synchronization you'll be able to make implicitely - whatever you do it'll eventually run out of synchronization at one point.
It should be possible to get close enough to the DSP's sample rate that the loss of synchronization should not be observed in practice for several hours which in most games would probably encounter an audio track change which would reset the buffer and start over again.  My main challenge is that the vblank interval is far too granular.  However were I to use a counter and a vblank buffer write that errs on the side of being slightly fast but that drops a frame whenever the counter reaches some integral I could get the time lapse before loss of synchronization much higher than a few minutes.  The counter is basically going to count how far off I am to ideal and when that difference goes above a whole frame ahead I can drop one frame to compensate and subtract the whole number off the counter (which keeps any remainder modulo for the next pass).  Once I write a routine to display frame counts in VRAM I can figure out on what frame the buffer overruns then calculate from that (divide the frame where overrun occurs by the buffer size in frames, in this case 120, to yield at which frame the write gets one full frame ahead of the audio rate) the difference the counter should advance every write.  Once I have proper frame dropping implemented it should go hours before losing sync (and again any track change resets this interval).

The variance in crystals is going to be on the order of parts per million or better so in practice once the routines are theoretically in sync (in this case I mean that I neglect the actual oscillator variances since it can change between even production batches) the actual hardware should go quite a while before losing sync.  A difference of 1 ppm in crystals would go over 11 days before being off by one second.  With a 2 second buffer that means 22 days before a possible buffer overrrun or underrun.  The game would need to play the same track constantly for the entire duration.  If a track is restarted (or the current track is stopped and a new one started) this interval resets to zero.  I think I could live with the audio briefly sputtering then skipping 2 seconds (which is what an overrun condition ends up sounding like) ahead once every three weeks. :P

Also even in the SPC itself it may not be possible to keep accurate sample rate reference unless a sample rate is chosen that counts evenly on the 64 KHz timer such as, say, 8, 16 or 32 KHz.

We don't need to be exact (I'm not exactly writing an atomic clock here) and in fact most games don't even try (and there are numerous reports that several games do indeed crash over time due to a variety of synchronization issues that crop up over prolonged execution).

EDIT:
Well I got around to coding an actual SPC driver for this but it feels like it's using more S-CPU cycles than the IPL version... :(
And it breaks in Snes9X/Geiger (need to run it in ZSNES/bsnes/Higan now).
And it still has a mutually exclusive loss of sync in ZSNES vs bsnes/Higan (mutual exclusive meaning if I fix it in one it breaks in the other) even using the SPC timer for syncing buffer writes from the S-CPU code!!!
This turns into a real hair puller!

looks to be an off-by one in the timer count values
when you write the $fa delay having $fa=count-1 makes it overrun in bsnes/higan but works right in zsnes  and correspondingly making it work in bsnes with $fa=count makes it underrun in zsnes.  :(

driver usage:
1. when driver is ready SNES will read $33 at $2140 and $77 at $2141
2. 2143 will immediately be a count of TODO frames
    when >0 you need to send a frame (see steps 3-9)
3. to start frame write 2140=ff, 2141=ff, 2142=ff, 2143=ff
4. read a zero from 2140 to know driver acknowledges
5. for the next 222 iterations
    6. write three bytes of audio to 2140,2141,2142
    7. write iteration count (1-222 inclusive) to 2143
    8. wait for iteration count to parrot on 2140
    9. repeat from step 6 until done

That's pretty much all there is to it.
Code: [Select]
SPCDriver:
arch spc700-inline

org $0400

SAMPLEDIR = $0200
STBUF     = $3728
STBUFSIZE = $48d8   ; 2072 brr chunks, 28 frames (666 bytes/frame)
STBUFMID  = $5b94   ; STBUF+STBUFHALF (middle of buffer)
STLASTBRR = $7ff7   ; STBUF+STBUFSIZE-9
STBUFTOP  = $8000   ; STBUF+STBUFSIZE
tempPtr   = $0
tempW1    = $2
tempW2    = $4
tempW3    = $6
bufWrPtr  = $8
bufHalfs  = $10
bufTODO   = $11

; macro implements a "movw immediate"
macro mvwi(mem,word)
    push a
    mov  a,#<word>&$ff
    mov  <mem>,a
    mov  a,#<word>>>8
    mov  <mem>+1,a
    pop  a
endmacro

; move 16-bit memory to memory
macro mvmw(memd,mems)
    mov <memd>,<mems>
    mov <memd>+1,<mems>+1
endmacro

; macro implements a "movw ya,#immediate"
macro i2ya(word)
    mov a,#<word>&$ff
    mov y,#<word>>>8
endmacro

; 16-bit move ya to memory
macro ya2m(mem)
    mov <mem>,a
    mov <mem>+1,y
endmacro

; immediate to dsp
macro mdsp(reg,val)
    mov a,#<reg>
    mov $f2,a
    mov a,#<val>
    mov $f3,a
endmacro

    ; set sample directory index 0
    ; to be the stream buffer
    %i2ya(STBUF)        ; movw ya,#STBUF
    %ya2m(SAMPLEDIR)
    %ya2m(SAMPLEDIR+2)
   
    ; set buffer write pointer to middle of buffer
    %mvwi(bufWrPtr,STBUFMID)
   
    ; zero out the streaming buffer
    %mvwi(tempW1,STLASTBRR)
    %mvwi(tempW2,STBUFTOP)
    %mvwi(tempW3,1)
    %i2ya(STBUF)
    %ya2m(tempPtr)

    -   mov x,#$00
        cmpw ya,tempW1  ; cmpw ya,#STLASTBRR
        bne +
            ; if on last brr chunk write loop+end flags
            mov x,#$03
        +
        push y
        push a
        mov  a,x
        mov  y,#0
        mov  (tempPtr)+y,a
        pop  a
        pop  y
        movw ya,tempPtr
        clrc
        addw ya,tempW3  ; addw ya,#$0001
        movw tempPtr,ya
        cmpw ya,tempW2  ; cmpw ya,#STBUFTOP
    bne -
   
    ; reset voice
    call spcResetVox

    %mdsp($2c,$00)  ; echo L vol=0
    %mdsp($3c,$00)  ; echo R vol=0
    %mdsp($6c,$00)  ; echo on

    ; set up frame timer (8kHz timer)
    ; counts passing of half the 74-chunk frame in time
    mov a,#148
    mov $fa,a
    mov a,#$01
    mov $f1,a       ; $FD is now counting every 37 BRRs
   
    ; voice 0 on
    %mdsp($4c,$01)

    ; wait 7 frames for the echo to stabilize
    mov x,#14
    -   mov a,$fd
        beq -
        dec x
        bne -
    ; enable echo volume and feedback
    %mdsp($2c,$28)                ; set echo vol L
    %mdsp($3c,$28)                ; set echo vol R
    %mdsp($0d,$50)                ; echo feedback

    ; signal S-CPU we are ready
    mov a,#$33
    mov $f4,a
    mov a,#$77
    mov $f5,a

    ; initialize buffer TODO count
    mov a,#0
    mov bufTODO,a
    mov bufHalfs,a

    spcMainLoop:
        mov a,$fd
        beq +
            ; count half frames
            clrc
            adc  a,bufHalfs
            mov  bufHalfs,a
            clrc
            lsr  a
            lsr  a
            mov  bufTODO,a
        +
        mov a,bufTODO
        mov $f7,a
        ; check for transfer frame sentinel from SNES
        ; f4=ff
        ; f5=ff
        ; f6=ff
        ; f7=ff
        mov a,$f7
        cmp a,#$ff
        bne +
        mov a,$f6
        cmp a,#$ff
        bne +
        mov a,$f5
        cmp a,#$ff
        bne +
        mov a,$f4
        cmp a,#$ff
        bne +
            ; got the sentinel
            ; decrement HALFS,TODO
            dec bufHalfs
            dec bufHalfs
            ; set up for transfer
            %mvwi(tempW1,STBUFTOP)
            %mvwi(tempW2,3)
            mov   x,#1
            ; acknowledge
            mov a,#0
            mov $f4,a
            -
                ; wait for nonce
                cmp x,$f7
                bne -
                mov   y,#0
                ; store bytes from f4-f6 to buffer
                mov   a,$f4
                ; must check if this is first byte
                ; of last BRR chunk of buffer
                cmp   bufWrPtr,#STLASTBRR&$FF
                bne ++
                cmp   bufWrPtr+1,#STLASTBRR>>8
                bne ++
                    ; if it is the last chunk
                    ; the LOOP and END bits need
                    ; to be set
                    or a,#$03
                ++
                mov  (bufWrPtr)+y,a
                inc   y
                mov   a,$f5
                mov  (bufWrPtr)+y,a
                inc   y
                mov   a,$f6
                mov  (bufWrPtr)+y,a
                ; acknowledge
                mov   $f4,x
                ; ready for next write
                movw  ya,bufWrPtr
                clrc
                addw  ya,tempW2     ;ya=ya+3
                %ya2m(bufWrPtr)
                inc   x
                ; check for done
                cmp   x,#223
            bne -
            ; if buffer is at end rewind
            cmpw ya,tempW1      ; cmp ya,#STBUFTOP
            bcc ++
                %i2ya(STBUF)    ; movw ya,#STBUF
                %ya2m(bufWrPtr) ; bufWrPtr=ya
            ++
        +
    jmp spcMainLoop
   
spcResetVox:
    %mdsp($6c,$20) ; unmuted, echo off
    %mdsp($4c,$00) ; clear key-on all voices
    %mdsp($5c,$ff) ; key-off (mute) all voices
    %mdsp($5d,$02) ; set sample directory to $0200
    %mdsp($6d,$80) ; echo buffer at $8000
    %mdsp($7d,$08) ; max echo delay

    %mdsp($0f,$7f)
    %mdsp($1f,$00)
    %mdsp($2f,$00)
    %mdsp($3f,$00)
    %mdsp($4f,$00)
    %mdsp($5f,$00)
    %mdsp($6f,$00)
    %mdsp($7f,$00) ; echo FIR coefficients 127,0,0,0,0,0,0,0

    %mdsp($00,$7f) ; V0 L VOL=max
    %mdsp($01,$7f) ; V0 R VOL=max
    %mdsp($02,$00) ; PITCH L
    %mdsp($03,$10) ;   ... H -> $1000 (32 kHz)
    %mdsp($04,$00) ; Sample #0
    %mdsp($05,$11) ; A=0 D=0
    %mdsp($06,$11) ; S=0 R=0
    %mdsp($07,$cf) ; GAIN=$1F (ignore ADSR)

    %mdsp($5c,$00) ; key-off (mute) all voices
    %mdsp($3d,$00) ; noise off all voices
    %mdsp($4d,$01) ; enable echo voice 0
    %mdsp($0c,$57) ; master vol L max
    %mdsp($1c,$57) ; master vol R max
    ret

arch 65816
« Last Edit: April 09, 2014, 01:44:45 pm by gauveldt »

Bregalad

  • Hero Member
  • *****
  • Posts: 2747
    • View Profile
Re: SNES Audio Streaming code
« Reply #3 on: April 09, 2014, 04:23:02 pm »
Quote
but it feels like it's using more S-CPU cycles than the IPL version...
The IPL version uses 100% of the CPU, so I don't think it's actually possible to make it any worse...
Quote
And it breaks in Snes9X/Geiger (need to run it in ZSNES/bsnes/Higan now).
And it still has a mutually exclusive loss of sync in ZSNES vs bsnes/Higan
This does not matter in any way, the only thing that counts it that it works on hardware. Then it's up to the emus to fix themselves in order to success at their job to emulate the SNES.

Normally if you use the S-DSP counters as an indication of the location of the sample playback's position, and transfer this information to the main CPU, it is 100% accurate and I fail to see how even an inaccurate emulator would fail to play this incorrectly, but I might be wrong.
Now the challenge is not to do this, which is fairly trivial, but to do this at the same time at running a sound engine on the SPC side and a game engine on the CPU side...

gauveldt

  • Jr. Member
  • **
  • Posts: 47
    • View Profile
Re: SNES Audio Streaming code
« Reply #4 on: April 09, 2014, 09:40:32 pm »
In an edit to the previous post I included an SPC driver to do the timing pulses for the audio streaming.
Normally if you use the S-DSP counters as an indication of the location of the sample playback's position, and transfer this information to the main CPU, it is 100% accurate and I fail to see how even an inaccurate emulator would fail to play this incorrectly, but I might be wrong.
Very simply, if an emulator has incorrect timer counter code it will get this wrong.
Does the timer count from 0 to the value store in $FA inclusive or exclusive?
Such a small fencepost change makes a huuuuge difference on anything trying to time sensitive data like audio streaming.  The error will integrate every time the timer wraps at the count (or count-1) to 0 and advances the 4-bit counter.
compare the following loops:
Code: [Select]
ldx #0
-   ; 0,1,2,3,4,5,6,7
    inx
    cpx #8
bcc -          ; blt -
versus
Code: [Select]
ldx #0
-   ; 0 1 2 3 4 5 6 7 8
    inx
    cpx #8
bcc - : beq - ; ble -

Quote
Now the challenge is not to do this, which is fairly trivial, but to do this at the same time at running a sound engine on the SPC side and a game engine on the CPU side...
With audio streaming there's no way to avoid the S-CPU having to transmit audio.  There is not enough RAM in 64k for any prolonged streaming.  Even simple things like extensive battle voices would exhaust the space large quickly.  1 second of audio at 32 kHz uses 18000 bytes leaving enough space in 64k for 3.6 seconds of audio using no other code, samples, nor echo.

Bregalad

  • Hero Member
  • *****
  • Posts: 2747
    • View Profile
Re: SNES Audio Streaming code
« Reply #5 on: April 10, 2014, 03:05:18 am »
Quote
Does the timer count from 0 to the value store in $FA inclusive or exclusive?
You should check Anomie's docs in order to be 100% sure, but I'm pretty sure that it is exclusive, that is, $01 is the fastest and $00 the slowest (it counts as a "256", not as a "0").

By reading the value in $fd-$ff, you can see how many times the timer wrapped arround, but it reset itself immediately. So if you want to actually use the value (which is the case here) you should store it somewhere.

Quote
1 second of audio at 32 kHz
You can (and probably should) use a lowe rate in order to consume a realistic bandwith and ROM space. However, if you manage to do it that fast then it's all good, better audio quality.

Quote
Even simple things like extensive battle voices
The original Street Fighter II have all voices fit at once in this space (which is very impressive).

gauveldt

  • Jr. Member
  • **
  • Posts: 47
    • View Profile
Re: SNES Audio Streaming code
« Reply #6 on: April 10, 2014, 05:05:07 pm »
Quote
You should check Anomie's docs in order to be 100% sure, but I'm pretty sure that it is exclusive, that is, $01 is the fastest and $00 the slowest (it counts as a "256", not as a "0").
All I can say for sure is that one of the two emulators ZSNES or bsnses/higan has it wrong.  My wager is that bsnes/higan has got it right, knowing how fussy byuu is about stuff like that.  Since using FA=count keeps the buffer correct in bsnes/higan (overruns if I use FA=count-1) and FA=count-1 keeps it correct in ZSNES (it underruns if I use FA=count) thus bsnes/higan is using exclusive and zsnes uses inclusive.
Quote
By reading the value in $fd-$ff, you can see how many times the timer wrapped arround, but it reset itself immediately. So if you want to actually use the value (which is the case here) you should store it somewhere.
I read it's value and immediately add the value read to a counter value so no worries.
PS: I periodically use a check in a loop to have a loop to wait desired number of count-ups.  Usually in code trying to wait long enough for new echo settings to validate (240ms or so).
Quote
You can (and probably should) use a lowe rate in order to consume a realistic bandwith and ROM space. However, if you manage to do it that fast then it's all good, better audio quality.
The original Street Fighter II have all voices fit at once in this space (which is very impressive).
In mono 32000 kHz the SNES will actually get frames where it does nothing (audio buffer will not need data yet).  I have the screen displaying the APU values, the ROM ptr being read for audio and the frame count in my ROM code using the driver.
In theory it should be possible to keep up with even a stereo 32 kHz stream but that may be pushing things (might need to be in force blank with interrupts off).

PSS: I have a hunch my SPC code to read the incoming port data may be inefficient and bottlenecking the SNES while it waits for the SPC to store the values and echo the coutner to indicate it is ready for another.  I need to look at it in more depth for ways to refactor the SPC-side receive/store loop

The rate to choose is limited if I want to use rates that can count brr chunks integral to the 8 kHz timer which means 8 kHz, 16 kHz or 32 kHz.  24 kHz, which I used in the IPL based streaming code, creates a problem because it results in steps of 3 samples that won't divide down to figure out the correct brr chunk.

Street Fighter II must have them sampled down to like 8 kHz or some such.

The only thing I have not now tried is an SPC driver designed for SNES HDMA to feed the data.  A slightly different method will be needed than I orignally thought since the SPC's loop will need to be cycle accurate to stay in sync to each scanline during the reads and it I can't use timing nonces with the HDMA if it is directly from ROM (I could from RAM but then the S-CPU is still working to put the data in RAM losing any benefit from the HDMA amortizing it over scanlines).  Overall same idea, but would need to write the sentinel and wait for SPC to acknowledge immediately at the start of vblank.  The spc code then kills enough time for the HDMA to have written four bytes to 2140-2143 on scan line 1, then read these and wait for the next four in a carefully cycle counted loop that stays in sync and lands the SPC reading its F$-F7 ports just after the HDMA has written its values (I doubt the 64 kHz timer has enough resolution for this and it wouldn't be in sync due to the snes and apu not sharing timers) spc's loop stops at 216 scan lines (the highest possible scanline count that sends a total frame size divisible by 9).  216 scan lines sends 96 brr chunks which is plenty.  The HDMA would probably by alternating on/off vblank to vblank and turned on only when the SPC driver has set 2143 >0 to say it needs more in the buffer.  32 kHz needs to send around 37 or so there may be points where there are as many as 2 vblanks in a row not needing data.

I see all sorts of hassles getting that to work however.
« Last Edit: April 11, 2014, 01:44:31 am by gauveldt »

Bregalad

  • Hero Member
  • *****
  • Posts: 2747
    • View Profile
Re: SNES Audio Streaming code
« Reply #7 on: April 11, 2014, 02:45:13 am »
ZSNES is definitely wrong then. You shouldn't care about the result on such an old and inaccurate emu.
Quote
24 kHz, which I used in the IPL based streaming code, creates a problem because it results in steps of 3 samples that won't divide down to figure out the correct brr chunk.
Come on, I'm not sure how you handle this, but handling arbitrary sample rates shouldn't be that hard.

Quote
slightly different method will be needed than I orignally thought since the SPC's loop will need to be cycle accurate to stay in sync to each scanline during the reads
Not terribly useful if you also want to run the music/SFX engine in parallel at the same time.

Disch

  • Hero Member
  • *****
  • Posts: 2814
  • NES Junkie
    • View Profile
Re: SNES Audio Streaming code
« Reply #8 on: April 11, 2014, 01:06:35 pm »
Quote
ZSNES is definitely wrong then. You shouldn't care about the result on such an old and inaccurate emu.

I disagree.  The proper course of action here is to contact the ZSNES team and file a bug report so that it works in ZSNES.

This sounds like it'd be a single character change to fix (>= to >).  It doesn't get much easier than that.

Quote
24 kHz, which I used in the IPL based streaming code, creates a problem because it results in steps of 3 samples that won't divide down to figure out the correct brr chunk.

Set the timer for 3x whatever your scale is, then divide the result by 4.

If you are counting individual brr blocks, then you'd set the timer to 12 / $0C.  Something like this:

Code: [Select]
mov A, the_timer  ; I forget the reg
clrc
adc A, fraction
mov Y, #0
mov X, #3
div YA,X  ; I forget exactly how div works... assuming remainder in Y
    ; but remainder might be in A.  Double check
mov fraction, Y

; A now contains the number of brr blocks we've passed.  Multiply by
;  9 to get number of bytes
mov Y, #9
mult YA

; YA is now the number of bytes.  Add that to your total and write back
;  to IO regs.

gauveldt

  • Jr. Member
  • **
  • Posts: 47
    • View Profile
Re: SNES Audio Streaming code
« Reply #9 on: April 11, 2014, 01:39:02 pm »
Quote
I disagree.  The proper course of action here is to contact the ZSNES team and file a bug report so that it works in ZSNES.
This sounds like it'd be a single character change to fix (>= to >).  It doesn't get much easier than that.
I theory, yes.  In practice, ZSNES is rather highly unmaintained due to the nontrivial build environment needed to compile/assemble it's open source code.
Edit PS: The lack of maintenance was bad enough that byuu had to fork zsnes with an unofficial build to get full 64mbit exHIROM support for SA-1 ROMs.
Quote
Set the timer for 3x whatever your scale is, then divide the result by 4.
If you are counting individual brr blocks, then you'd set the timer to 12 / $0C.  Something like this:
At 24 kHz sample rate, setting each 8 kHz timer stage 1 tick counts 3 samples so setting the stage 2 period to 12 results in a stage 3 period of 36 samples, or  2.25 brr blocks (16 samples per brr block).

It's too easy to overflow that measly 4-bit counter if one gets bogged down with some other processing between checks at such high rates so I've actually been counting buffer frames (the number of brr chunks SNES sends per pass) as opposed to individual BRR chunks.  At this point my existing driver's buffer slice is 74 brr's and the HDMA experiment would send (216*4)/9=96 brr's.
There is not enough bits of period to count a full 74 so I've actually set the period to 37 chunks (148 at 32 kHz), read the 4-bit ounter onto a stage one half count, then a second stage TODO count (what the SNES gets sent to tell it a buffer slice is needed) counts on two's making it advance once every 74 brr's, or one buffer slice.  When this is set up to count slices of 96 brrs (the HDMA case) the SNES will use the TODO value sent to 2143 to duty cycle the HDMA (on if TODO>0 otherwise off) during each vblank.
Quote
Not terribly useful if you also want to run the music/SFX engine in parallel at the same time.
It can do other things outside of handling the transfer command code (sent by SNES prior to starting the HDMA) but the transfer loop itself would be critical and have to stay tight to the scanlines sent by the HDMA.  It has to ensure it reads the ports just after the HDMA finishes the write on the corresponding port and stay in sync to maintain this loop until the scanlines are exhausted.  Then it can do other process until the next transfer command code.  It would need to rely on the timers a lot more to keep music in step because it needs to drop everything to service the HDMA inputs after receiving the "I need your full attention for the next 216 scanlines" command.  Also I think the first scanline's data needs to also be another sentinel (so now we send 1+216 scanlines per pass) for the SPC to line up the scanline loop due to the variation of vblank timing and when the SNES might actually get the command to the SPC (and the SPC to notice it).  The first command tells it to divert attention, the sentinel is a sync.  216 scanlines*4 bytes is a multiple of 9 (aligned to brr chunks) so any 4-byte value with the last byte unequal to the last byte of the command code and having an invalid BRR range ($Dx $Ex or $Fx) in the first byte sent (2140/f4's byte) works as a sentinel.  The SPC can short cut by spinning until f7 is equal to last byte's expected value of the sentinel (the choice for sentinel's last byte value must be one unequal to the last byte of command code or any of the complements you might read if you snag the SNES/APU port access race condition).  The loop to read each scanline's data is going to be incredibly tight.  It might even need to be unrolled (lots of macro abuse).

I did say I see numerous issues in trying to get this working reliably.
« Last Edit: April 11, 2014, 02:09:00 pm by gauveldt »

Bregalad

  • Hero Member
  • *****
  • Posts: 2747
    • View Profile
Re: SNES Audio Streaming code
« Reply #10 on: April 11, 2014, 02:24:41 pm »
Quote
I disagree.  The proper course of action here is to contact the ZSNES team and file a bug report so that it works in ZSNES.

This sounds like it'd be a single character change to fix (>= to >).  It doesn't get much easier than that.
Absolutely, in fact I was meaning you shouldn't care about the result, which doesn't mean you shouldn't care about telling the bug to the authors (then it's up to them if they want to maintain it or not).

gauveldt

  • Jr. Member
  • **
  • Posts: 47
    • View Profile
Re: SNES Audio Streaming code
« Reply #11 on: April 12, 2014, 01:00:34 pm »
Looks like the issue I was having in Geiger was a side effect in Geiger of setting the GUI's 'disable SPC emulation' whenever an SPC stop instruction is encountered.  Probaby not an ideal behavior (My use case for STOP on SPC side is when I want to debug something at some point of execution I can dump a 'hello world' number or up to 4 other relevant values then do the STOP to basically freeze state, then the SNES can display the dumped values via 2140-2143 onto the screen).  Since I didn't know Geiger had enabled a setting that persisted after quitting Geiger9X it appeared to be broken when what was really happening is that it was in "audio disable" mode.

It started running somewhat properly again once I reenabled SPC-700 emuation.

Looks like in Geiger I gets an overrun when code is written to assume SPC timers count exclusive to stage 2 period setting.  Meaning it's counting to P-1 on the timers.  Three emulators, each doing it differently now.

When my code assumes timer counts 0 to (P-1) inclusive (P==1==fastest):
EmulatorTimer FencepostResult
BSNES/Higan0 to P-1stays in sync
ZSNES0 to Punderruns
Geiger9X0 to P-2 (???)overruns

(???): It could be 0 to P-2, or, 1 to P-1, or any shorter interval

I do not know how real hardware responds.

Disch

  • Hero Member
  • *****
  • Posts: 2814
  • NES Junkie
    • View Profile
Re: SNES Audio Streaming code
« Reply #12 on: April 12, 2014, 03:31:35 pm »
Quote
I do not know how real hardware responds.

This is problematic.

You basically have 3 choices:

1)  Find out how this works on a real SNES... and get your code working on a real SNES first and foremost.

2)  Pick and choose which emulator you want this to work on, and get it working in that emu.

3)  Try to support all emus by doing some "emulation detection" (didn't you make a thread about this before?  Now at least you know of a way to tell the emus apart:  the SPC timer)


Note that solution #1 is really the only "good" solution here.  Doing #2 (or even #3) pretty much ensures that your hack/rom will fade into obscurity as emulation improves.

#1 is really the only "future proof" way to go about this.

gauveldt

  • Jr. Member
  • **
  • Posts: 47
    • View Profile
Re: SNES Audio Streaming code
« Reply #13 on: April 12, 2014, 07:14:34 pm »
1)  Find out how this works on a real SNES... and get your code working on a real SNES first and foremost.
I no longer have a PC capable of using my old 24mbit cartridge emulation hardware (new PCs no longer have ISA ports - not to mention the card edge connectors connecting from the PC card's ribbon cables to the cartridge board have gotten very flakey over time and barely work).  SD2SNES is beyond my means right now ($200 USD+) until green carpet season.  The only hope for this seeing the address/data buses of a real SNES is for someone with the SD2SNES hardware to build and test from source (commenting out the SA-1 bank setup for 64mbit, reducing the rom+audioImage to 32mbit and adjusting the ROM header to match).
Quote
2)  Pick and choose which emulator you want this to work on, and get it working in that emu.
Since my code is written to assume timers tick from 0 to (P-1) [inclusive] before incrementing the counter and restarting from 0 it is currently written to work with bsnes/higan.  In theory (according the anomie's apu docs) this matches the behavior on real hardware.
Quote
3)  Try to support all emus by doing some "emulation detection" (didn't you make a thread about this before?  Now at least you know of a way to tell the emus apart:  the SPC timer)
All I got from that thread were individuals basically saying "don't put in emulator specific hacks."  Needless to say I didn't get any useful code I can put in to do the detection [d4s has such a detection routine but he keeps his code shrouded in secrecy to maintain security by obscurity for the nag screen he uses when his romhacks being used on real hardware :( ].  Though we know the timer is at issue across emulators  I don't see any easy way to automatically detect the offset short of asking the human listening if the audio is okay, garbles-then-skips-ahead-in-time or garbles-then-skips-back-in-time.
« Last Edit: April 12, 2014, 07:44:02 pm by gauveldt »

Disch

  • Hero Member
  • *****
  • Posts: 2814
  • NES Junkie
    • View Profile
Re: SNES Audio Streaming code
« Reply #14 on: April 13, 2014, 02:05:57 am »
Quote
I no longer have a PC capable of using my old 24mbit cartridge emulation hardware

Yeah I don't have any way to test this on actual hardware either.

Really, though, you're kind of playing with fire on this.  You're doing something sort of cutting edge that depends on very specific behavior on the system.  And you admittedly aren't sure of what that behavior actually is.  So you're flying blind.

Running on a real system is the only way to know for sure.  Whether or not that's an option that is available to you is another matter.  Maybe it isn't available.

Quote
Though we know the timer is at issue across emulators  I don't see any easy way to automatically detect the offset short of asking the human listening if the audio is okay,

Anything that can be observed by the CPU can be detected.  The timer can be observed by the CPU, therefore it can be detected.

We know the 1st stage timer is clocked every 64 cycles (on the slow timers anyway).
We know that if we set the target to $02 we expect the 2nd stage timer to be clocked once the 1st stage timer is clocked twice (so every 128 cycles)

So if you set the timer, then wait 512 cycles in a timed loop... you would expect to read back '4' from the timer reg.

On emus that clock faster or slower... you'll read back a different value.  That value can be your emulator ID

gauveldt

  • Jr. Member
  • **
  • Posts: 47
    • View Profile
Re: SNES Audio Streaming code
« Reply #15 on: April 13, 2014, 03:19:28 am »
We know the 1st stage timer is clocked every 64 cycles (on the slow timers anyway).
We know that if we set the target to $02 we expect the 2nd stage timer to be clocked once the 1st stage timer is clocked twice (so every 128 cycles)
So if you set the timer, then wait 512 cycles in a timed loop... you would expect to read back '4' from the timer reg.
On emus that clock faster or slower... you'll read back a different value.  That value can be your emulator ID
IIRC the cycle counts of SPC700 instructions are not widely verified making it difficult to do a timed loop.

ZSNES is rather easy to catch.  I noticed in ZSNES voice 0's OUTX register remains zero even when voice 0's samples are nonzero.  Easy way to catch ZSNES is to have voice 0 loop the brr chunk c3 78 78 78 78 78 78 78 78 (with GAIN set to max+'ignore envelope' bit) and after allowing enough time for the DSP to have filled the buffer and have performed a write to OUTX, read OUTX and check for it to be nonzero.

In any other case OUTX could be used to test the timing:
1. my write slices are 74 brr's at a time
2. I set the target of T0 8KhZ timer to 148 which counts 37 brr's at 32 kHz
3. fill the buffer with the pattern:
    c0 11 11 11 11 11 11 11 11     x74
    c0 22 22 22 22 22 22 22 22     x74
    c0 33 33 33 33 33 33 33 33     x74
   ...
    c0 77 77 77 77 77 77 77 77     x74
   repeat from c0 11 11 ... entry
4. start the voice (at pitch $1000 or 32 Khz), wait for OUTX to be nonzero (I think OUTX should be yielding a $10 on the c0 11 11 ... BRR) reset the counter then store the values read from OUTX over the next several counter advances (counting in 2's to match the 74-brr-aligned pattern written to the buffer)
5. I would think if the timer is in sync you'd get OUTX values $10,$20,$30,...,$70,etc
6. if the target fencepost is +1 and underrunning there'd be a duplicate for instance $10,$20,$30,$30,$40,$50...
7. if the target fencepost is -1 and overrunning there'd be a skip in values for instance $10,$20,$30,$50,$60,...
(OUTX is written before volume multiplication so the voice volume may be zero'ed during this test)

It would be VERY interesting for some brave soul to make a small SPC driver to test this on actual hardware and report the result.

Bregalad

  • Hero Member
  • *****
  • Posts: 2747
    • View Profile
Re: SNES Audio Streaming code
« Reply #16 on: April 13, 2014, 03:54:10 am »
If you have something specific to test I can test it for you with my Super Power Pak on my PAL SNES.
I however have no NTSC SNES to test it on.

gauveldt

  • Jr. Member
  • **
  • Posts: 47
    • View Profile
Re: SNES Audio Streaming code
« Reply #17 on: April 13, 2014, 06:11:11 am »
If you have something specific to test I can test it for you with my Super Power Pak on my PAL SNES.
I however have no NTSC SNES to test it on.
How does a PAL SNES change in the SPC?
Are the timers or pitch-to-sample-rate any different?

Disch

  • Hero Member
  • *****
  • Posts: 2814
  • NES Junkie
    • View Profile
Re: SNES Audio Streaming code
« Reply #18 on: April 13, 2014, 10:01:19 am »
Quote
IIRC the cycle counts of SPC700 instructions are not widely verified making it difficult to do a timed loop.

I'm pretty sure Anomie and Blargg went through all of them years ago and verified the timing.  If emus have it wrong, then all that will mean is that the timer will read back a different value than expected... which is already what we're looking for.

In a detection system it's unimportant whether emus are accurate.  What's important is that we get different output for emu A vs emu B vs the real system.

Quote
In any other case OUTX could be used to test the timing:

That's an interesting idea!  That could definitely work.

gauveldt

  • Jr. Member
  • **
  • Posts: 47
    • View Profile
Re: SNES Audio Streaming code
« Reply #19 on: April 13, 2014, 11:11:39 am »
I'm pretty sure Anomie and Blargg went through all of them years ago and verified the timing.  If emus have it wrong, then all that will mean is that the timer will read back a different value than expected... which is already what we're looking for.

In a detection system it's unimportant whether emus are accurate.  What's important is that we get different output for emu A vs emu B vs the real system.
Setting a voice to a specially generated sample then setting a timer and testing which samples are on the timer hits by checking OUTX whenever the stage 3 counter advances seems to be the way to go.  It can compute the needed stage 2 target (except ZSNES since OUTX is always zero) in an emulator-and-snes-hardware-neutral manner.

I don't so much need to detect which particular emulator in so much as I rather need to know what value to set for the timer's stage 2  target.  Knowing I'm on an emulator for nag screens or other chicanery isn't so much my interest here.