News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

### Author Topic: 65816 (SNES) Shift and Add-3 (converts binary to bcd/decimal)  (Read 6105 times)

#### gauveldt

• Jr. Member
• Posts: 47
##### 65816 (SNES) Shift and Add-3 (converts binary to bcd/decimal)
« on: April 07, 2014, 02:21:37 pm »
I've seen others in various romhacking posts and other places avoid writing routines that convert binary numbers to decimal to display results of calculations or otherwise on screen.  Any decent game needs to do this really and it's not all that hard to do.  One will note that professional games rarely store values in RAM using BCD format as it would use up a lot of extra memory.

The way to do this very quickly is an algorithm called Shift and Add-3 or "double dabble".

It's not that hard to explain in pseudocode:
Edit: so much for pseudo code.... I basically recalled the entire ASM from memory while house sitting and am away from the computer I actually write code on... o_O

1. start with your binary number padding with 3 nybbles or so for every 3 bits (or portion) of input (or you know that a 32-bit unsigned 4 byte integer has a max value of 4.29 billion and requires 10 digits hence 10 nybbles).

That's the original binary value with zero bytes (10 nybbles after it):
in0 in1 in2 in3 nyb01 nyb23 nyb45 nyb67 nyb89 = 9 bytes

B2D_DIGITS = 10 bytes (stick this somewhere in wram or bw-ram)

we're going to use 16-bit operations so let's pad that odd byte with a 00
in0 in1 in2 in3 nyb01 nyb23 nyb45 nyb67 nyb89 00 = 10 bytes
clear nyb00 thru to the pad byte after nyb89 inclusive to \$00
Code: [Select]
`; precondition: in0 thru in3 are set to the 32-bit value to convert;   -- pad lesser sizes (ie: bytes, words, 24-bit ptrs) with \$00 in higher order inX values)Bin2Dec:-   ldx.w #\$09    lda #\$00    sta nyb01,x    dex    bpl -`
set the shift count to 0:

Code: [Select]
`stz B2D_SHIFTCOUNT      ; just define this to be somewhere in WRAM (or BW-RAM if you use SA-1)`
2. shift everything left starting from in0,in1 and working towards nyb89 00
Code: [Select]
`B2D_Loop:rep #\$20clcrol in0        ; +in1rol in2        ; +in3rol nyb01   ; +nyb23rol nyb45   ; +nyb67rol nyb89   ; +00sep #\$20`3. count this shift.  we are done if this is 32nd shift.
Code: [Select]
`inc B2D_SHIFTCOUNTlda B2D_SHIFTCOUNTcmp #\$20beq B2D_UnpackDigits`4. now the fun part...  we scan each byte in nyb01 thru nyb89 for nybbles >= 5 and add 3 any that are
instead of trying to mess with a whole shwak of logic ops shifts, pushes, pops, etc I used a lookup table to shirnk it down to only a few instructions, the most complicated being to turn the read nybbles into a pointer value (in my code I tend to leave the index regs at 16 bits.  The 160-byte lookup table will be described at the end).  NB: The order you do the adds does not matter only that you check every nybble.
Code: [Select]
`ldy.w #\$04-   lda nyb01,y    xba    lda #0    xba       ; makes xxnn into 00nn    rep #\$20    tax    sep #\$20    lda Add3Tbl,x    sta nyb01,y    dey    bpl -`5. Repeat for remaining shifts
Code: [Select]
`    bra B2D_Loop`6. Technically we are done the heavy part but these will be awkward to use in printing functions and they will need to unpack the nybbles to display text so we may as well do the unpacking in the same routine:
Code: [Select]
`B2D_UnpackDigits:    ; we will unpack so the MSB digits come first as humans would read them    ldx.w #9        ; points to digit to store lsd    ldy.w #0        ; points to nybbles lsb-   lda nyb01,y    pha    and #\$0f    sta B2D_DIGITS,x    dex    pla    lsr : lsr : lsr : lsr    and #\$0f    sta B2D_DIGITS,x    dex    iny    cpy #\$05    bne -`7. That's all, folks
Code: [Select]
`    rts`
The look up table:

first 16 bytes the table looks as follows (based on a truth table for shift and add-3):
\$00,\$01,\$02,\$03,\$04,\$08,\$09,\$0a,\$0b,\$0c
For the table I've zeroed the unreachable entries (it needs to be padded to 16 bytes)
this is the horizontal part of the table that takes care of the low nybble...
now we repeat the pattern with the upper nybble counting the same pattern vertically:
Code: [Select]
`Add3Tbl:db \$00,\$01,\$02,\$03,\$04,\$08,\$09,\$0a,\$0b,\$0c,\$00,\$00,\$00,\$00,\$00,\$00db \$10,\$11,\$12,\$13,\$14,\$18,\$19,\$1a,\$1b,\$1c,\$00,\$00,\$00,\$00,\$00,\$00db \$20,\$21,\$22,\$23,\$24,\$28,\$29,\$2a,\$2b,\$2c,\$00,\$00,\$00,\$00,\$00,\$00db \$30,\$31,\$32,\$33,\$34,\$38,\$39,\$3a,\$3b,\$3c,\$00,\$00,\$00,\$00,\$00,\$00db \$40,\$41,\$42,\$43,\$44,\$48,\$49,\$4a,\$4b,\$4c,\$00,\$00,\$00,\$00,\$00,\$00db \$80,\$81,\$82,\$83,\$84,\$88,\$89,\$8a,\$8b,\$8c,\$00,\$00,\$00,\$00,\$00,\$00db \$90,\$91,\$92,\$93,\$94,\$98,\$99,\$9a,\$9b,\$9c,\$00,\$00,\$00,\$00,\$00,\$00db \$a0,\$a1,\$a2,\$a3,\$a4,\$a8,\$a9,\$aa,\$ab,\$ac,\$00,\$00,\$00,\$00,\$00,\$00db \$b0,\$b1,\$b2,\$b3,\$b4,\$b8,\$b9,\$ba,\$bb,\$bc,\$00,\$00,\$00,\$00,\$00,\$00db \$c0,\$c1,\$c2,\$c3,\$c4,\$c8,\$c9,\$ca,\$cb,\$cc`You can safely lose the last six 00 bytes on the final (\$cx) row, making this a 154-byte table.

There now everyone should be able to write their players' experience and money to the screen in decimal rather than hex
« Last Edit: April 07, 2014, 02:39:39 pm by gauveldt »

#### Disch

• Hero Member
• Posts: 2814
• NES Junkie
##### Re: 65816 (SNES) Shift and Add-3 (converts binary to bcd/decimal)
« Reply #1 on: April 09, 2014, 12:31:17 pm »
How does this perform compared to the more traditional "keep dividing by 10 until you run out of data" approach?  While clever, it seems like this would be slower (and with the LUT, take up more ROM space).  This is a pretty sizable loop that has to run 32 times... whereas the division loop would only have to run at maximum 10 times.

#### gauveldt

• Jr. Member
• Posts: 47
##### Re: 65816 (SNES) Shift and Add-3 (converts binary to bcd/decimal)
« Reply #2 on: April 09, 2014, 01:19:39 pm »
How does this perform compared to the more traditional "keep dividing by 10 until you run out of data" approach?  While clever, it seems like this would be slower (and with the LUT, take up more ROM space).  This is a pretty sizable loop that has to run 32 times... whereas the division loop would only have to run at maximum 10 times.
Staying in integer land with contstant complexity and no divisions to require the multiply/divide hardware just to write out numbers seems like a suitable tradeoff for the small lookup table.

Not using the hardware, the divide by 10 becomes a loop to count how many tens count to the input value per pass, making it O(n*n).

The routine may be optimized for specific numeral sizes (ie: 16-bit word maximums could be use a version of the above algorithm only using 16 passes).

#### Disch

• Hero Member
• Posts: 2814
• NES Junkie
##### Re: 65816 (SNES) Shift and Add-3 (converts binary to bcd/decimal)
« Reply #3 on: April 09, 2014, 01:49:09 pm »
Constant complexity is not necessarily a good thing.  "Early out" approaches are typically better performers.

And if the division hardware is there... why not use it?  Especially if using it results in faster and smaller code.

Measuring performance is pretty easy on the SNES.  There's really only 2 things you'd need to look at:  cycles and bytes.  IE:  how fast is the code, and how big is it.

There's some wiggle room where some games might want faster code at the cost of a few extra bytes... and other games might want smaller code that runs a little slower.  But if an implementation is both big and slow, it's not favorable.

I'm not trying to be judgmental or anything.  This is definitely an interesting approach... I'm just curious as to how efficient it is.  If it doesn't perform as well as the divide-by-10 approach, then it doesn't seem very practical to me.

I might whip up a divide-by-10 routine later and tally up the cycle count to see how it compares.

Quote
Not using the hardware, the divide by 10 becomes a loop to count how many tens count to the input value per pass, making it O(n*n).

Yeah but not using the hardware is retarded.  That'd be like trying to do this without using ROL.  Sure you could do it, but what's the point?

Quote
The routine may be optimized for specific numeral sizes (ie: 16-bit word maximums could be use a version of the above algorithm only using 16 passes).

Well... the 16-bit divide-by-10 approach would have 5 passes... at most (frequently it would have less -- early outs are good)

#### assassin

• Full Member
• Posts: 146
##### Re: 65816 (SNES) Shift and Add-3 (converts binary to bcd/decimal)
« Reply #4 on: April 27, 2014, 07:41:16 am »
i like Andrew Jacobs' "More Hexadecimal to Decimal Conversion" approach here:

http://www.6502.org/source/

you'd want to modify it to include 16-bit instructions, as well as to handle 32-bit input and output variables.  should be a trivial adjustment.

by the author's admission, it uses somewhat more cycles.  but on the bright side, it takes less space and doesn't make my eyes bleed.

i understand nothing from the original post here, and regret having even looked at it, but will offer that the "lda #\$00" can be moved in front of the "Bin2Dec" loop.

#### gauveldt

• Jr. Member
• Posts: 47
##### Re: 65816 (SNES) Shift and Add-3 (converts binary to bcd/decimal)
« Reply #5 on: April 27, 2014, 11:26:53 am »
i understand nothing from the original post here, and regret having even looked at it, but will offer that the "lda #\$00" can be moved in front of the "Bin2Dec" loop.
'
http://en.wikipedia.org/wiki/Double_dabble
http://people.ee.duke.edu/~dwyer/courses/ece52/Binary_to_BCD_Converter.pdf

April 27, 2014, 11:46:46 am - (Auto Merged - Double Posts are not allowed before 7 days.)
Well... the 16-bit divide-by-10 approach would have 5 passes... at most (frequently it would have less -- early outs are good)

So this is why games slow down in the menu when you are at 9999/9999 HP, 999/999 MP and have 9,999,999 GP.
« Last Edit: April 27, 2014, 11:46:46 am by gauveldt »

#### FinS

• Full Member
• Posts: 192
##### Re: 65816 (SNES) Shift and Add-3 (converts binary to bcd/decimal)
« Reply #6 on: April 28, 2014, 12:36:08 am »
This might be too simple but you could switch to decimal mode and bitshift and add the decimal value of each shifted bit. It would require approximately 42 operations at about 4 cycles per op for the first 13 bits then 21 more operations for the last 3 bits because the converted decimal value would exceed the 16 bit limit. So in total to convert a 16 bit number it would only cost 63 operations and around 250 cycles.

#### assassin

• Full Member
• Posts: 146
##### Re: 65816 (SNES) Shift and Add-3 (converts binary to bcd/decimal)
« Reply #7 on: April 28, 2014, 01:53:04 am »
Quote from: gauveldt

thanks.  i should have hit those links before having a brain meltdown from staring at the code.  my bad.

----

Quote from: FinS
This might be too simple but you could switch to decimal mode and bitshift and add the decimal value of each shifted bit.

that's what the code i linked does.  it doesn't separate the 16 iterations into the 13/3 you propose, but just does all 16 the same way.  so it's giving up a few dozen cycles in exchange for saving about a dozen bytes.

• Hero Member
• Posts: 2753
##### Re: 65816 (SNES) Shift and Add-3 (converts binary to bcd/decimal)
« Reply #8 on: April 28, 2014, 02:15:26 am »
Quote
Not using the hardware, the divide by 10 becomes a loop to count how many tens count to the input value per pass, making it O(n*n).
No this is completely wrong. You can divide an integer by 10 without any dividing instruction/hardware and in constant time.

If you're interested as an example in how exactly I can link to here (I wrote that) it's not 65816 it's plain old 6502, but the concept applies to all CPUs without a divide instruction.
In all simplicity, this means you have to compare with 10, 20, 40, 80, etc... (in reverse order) until the max. value you want to support for the quotient.
« Last Edit: April 28, 2014, 09:22:25 am by Bregalad »

#### FinS

• Full Member
• Posts: 192
##### Re: 65816 (SNES) Shift and Add-3 (converts binary to bcd/decimal)
« Reply #9 on: April 30, 2014, 02:17:49 am »
Here's a method I got out of a game then altered for optimization. It just subtracts the hex values of each decimal digit and counts how many times it can be subtracted. Then places the digits one per byte starting at \$20 ram.  For the largest number (\$FFFFFFFF = 4,294,967,295) it takes 776 operations to do the conversion. It could be further optimized by using another subroutine for the last 4 which don't require math on the 16 high bits.

Code: [Select]
`//enter hex equivalents for each decimal digit in \$04 and \$06, largest first.al.xl REP #\$30                 STZ \$0A     LDX \$0A LDA #\$CA00  STA \$04     LDA #\$3B9A   STA \$06     JSR SUBTR LDA #\$E100   STA \$04     LDA #\$05F5   STA \$06     JSR SUBTR LDA #\$9680   STA \$04     LDA #\$0098   STA \$06     JSR SUBTR LDA #\$4240   STA \$04     LDA #\$000F   STA \$06     JSR SUBTR LDA #\$86A0   STA \$04     LDA #\$0001   STA \$06     JSR SUBTR LDA #\$2710   STA \$04     STZ \$06     JSR SUBTR LDA #\$03E8   STA \$04     STZ \$06     JSR SUBTR LDA #\$0064   STA \$04     STZ \$06     JSR SUBTR LDA #\$000A   STA \$04     STZ \$06     JSR SUBTR .as SEP #\$20 LDA \$00 STA \$20,X RTS    //subtract equivalents from hex number in \$00 and \$02, increment base 10 digitSUBTR.al.xl REP #\$30   STZ \$08   YOKE SEC        LDA \$00    SBC \$04    TAY LDA \$02    SBC \$06    BCC VALE   STY \$00 STA \$02 INC \$08    BRA YOKE    VALE.as SEP #\$20   LDA \$0A   // \$0a is flag to check if a number has come up yet BNE XRAY   LDA \$08    BNE ZOO  LDA #\$20    // \$20 is tile value for empty space if no numbers have come up yet BRA WHY   ZOO   INC \$0A  XRAY  LDA \$08   WHY   STA \$20,x  INX       .al REP #\$20                 RTS          `