News: 11 March 2016 - Forum Rules

Author Topic: How to make a minimal N64 ROM?  (Read 4422 times)

diaspora

  • Jr. Member
  • **
  • Posts: 9
    • View Profile
How to make a minimal N64 ROM?
« on: September 20, 2015, 02:56:06 am »
When I started out learning NES programming/hacking, I followed a tutorial that showed how to construct the header for the ROM file, how to do basic ASM coding, and how to draw a sprite to the screen. Looking at N64 programming guides, it all seems to be "download this toolchain, write a program in C and compile with GCC, and bam, working N64 ROM", which isn't what I'm looking for. I've been able to find r4300i ASM tutorials and assemblers, but nothing on the basics of what makes an N64 ROM. What do I need for a header (and is there an additional header for the emulator to use, or is it just a raw ROM file)? How do I read data from the cart, get input from the controllers, and send commands to the RCP? Thanks much.

FAST6191

  • Hero Member
  • *****
  • Posts: 3287
    • View Profile
Re: How to make a minimal N64 ROM?
« Reply #1 on: September 20, 2015, 06:44:01 am »
Afraid my knowledge of N64 MIPS and ROM formats is more theoretical (think turning AR codes into hard patches and basic program flow changes than having non CPU registers and their formats memorised) than what you look like you are aiming for here, however would " http://imrannazar.com/The-Smallest-NDS-File but for the N64, and if you have it then the N64 equivalent of http://problemkaputt.de/gbatek.htm as well" be a fairly accurate summary of what you want?

Zoinkity

  • Hero Member
  • *****
  • Posts: 565
    • View Profile
Re: How to make a minimal N64 ROM?
« Reply #2 on: September 20, 2015, 02:46:59 pm »
To be completely fair, a "minimal" N64 ROM is a header and a bootstrap.  A header is the first 0x40 bytes.  Very little of this is necessary for console, but a different set of the info is necessary for most emulators.  Following that, the next 0xFC0 bytes of code is the cart's bootstrap, a program loaded by PIFROM during the boot sequence that initializes the system, sets up rdram, sets various system values at 80000300, and loads the actual game into memory.

In all but one special case game code starts at 0x1000 in the ROM.  At that point you do have to set a flag at the end of pifram to prevent the PIF going into an infinite loop.  Besides that, you can basically write whatever you want.  Most people use libraries though so they have threading, standard exception handling, basic display modes, further initialization like turning on exceptions and initializing the various interfaces, etc.  That said, HCS's NES emulator Neon64 was coded in ASM a doesn't do most of that.


The funny thing about bootstraps is that they are directly tied to the CIC chip in the cart being used.  The same seed used for the internal checksum is also used for a separate bootstrap checksum, and the compared value is burned directly into the CIC chip.  The only cases in which you can use a nonstandard bootstrap is if 1) you can burn your own CIC, or 2) use a bootloader to start a game, or 3) use one of several emulators that ignores vast portions of the boot sequence.
As such, you're stuck using one of the original bootstraps for any new game.  Most people just choose 7101/6102--the "mario" bootstrap.  Others obfuscate the entrypoint, use different checksums, and a couple change the amount of code initially loaded during the checksum process.

The header:
Code: [Select]
0x0 4 initial PI settings
80000000 indicator for endianess, though don't think the console looks for this
00F00000 initial PI_BSD_DOM1_RLS_REG
000F0000 initial PI_BSD_DOM1_PGS_REG
0000FF00 initial PI_BSD_DOM1_PWD_REG
000000FF initial PI_BSD_DOM1_LAT_REG
0x4 4 clockrate override (0 uses default)
FFFFFFF0 clockrate
0000000F
0x8 4 entrypoint (pointer setting where your code loads)
0xC 4 release
0x10 8 checksum
0x18 8 unused
0x20 0x14 internal name, using codepage-932, usually padded with spaces
0x34 7 unused
0x3B 1 format (can't remember one for iQue)
'N' cart
'D' 64DD disk
'C' cartridge part of expandable game
'E' 64DD expansion for cart
'Z' Aleck64 cart
0x3C 2 gameID (alphanumeric)
0x3E 1 country code; there are some multi-regional codes, but major ones are:
'J' Japan
'E' North America
'P' European PAL release
'D' Germany
'S' Spain
'F' France
'I' Italy
'U' Australia
'C' China
'H' Netherlands
'K' South Korea
0x3F 1 version (00 = 1.0, 15 = 2.5, etc.)
Of these, console requires:
  • initial PI settings
  • clockrate (if using a standard library, otherwise not)
  • entrypoint
  • checksum
(As a minor caveat there's 1-2 titles that used the country code to select initial language + video settings.  They're exceptions, and nothing after checksums are even set in prototypes.)

Emulators will usually demand the initial PI settings match 0x80371240 or they will not load.  They use it to determine if this is an N64 cart and what the endianess of the ROM dump is.  N64 code is all big-endian, but a popular dumper spit out LE.  Actual fact it's a 'safe' value, can be changed, and some devices like the 64DD's IPL and development equipment use different values.
Most emulators don't emulate low enough level for the clockrate override to matter.
The entrypoint is important.  Not only does it set where code is loaded, it also is used by certain CPU plugins as a start point for the recompiler (Nemu for instance).
Checksum is required.
Some emulators require the internal name and gameID be known, to varying degrees.  HLE plugins often look up what settings to use based on these.

+_+

Doing things:
I'm going to assume interrupts are off for this.  If they're on you need to set up the exception vectors.  These are:
Code: [Select]
80000000 bad virtual address
80000080 bad 64bit virtual address, which you can't throw since locked in 32bit address mode
80000100 cache miss
80000180 general exception vector
Those addresses are not suggestions!  In general all will be jumps to the general exception handler, which you'll get to write for yourself if you aren't using library code.  You'll have to test for flags when interrupts are thrown, and they're thrown when a request is completed.

To DMA info from the cart you send a request to the PI.  A4600000 is the base for PI addresses, and A4600010 is its status.  It's &1 is an IO operation is happening (like reading a word from a hardware address) and &2 if a DMA is in progress.  It's usually best to test &3.  Note DMA is in 16bit increments but you can directly read only 32bit values.
First, check that it isn't busy.  Be sure to include a NOP in the loop or you could cause a rather nasty-to-debug lock on hardware.
Code: [Select]
LUI V0,A460
LW V1,0010 (V0)
ANDI V1,V1,0003
BNE V1,R0,-3
NOP
From there you set the hardware address, the rdram address (without the 80000000), and when you set either the read or write length the DMA should start.  The length should be -1 what you want to read, and always multiples of 2. 
As an example, to load 0x100000 bytes from 0x1000 to 80000400 you could write:
Code: [Select]
LUI V0,A460
ADDIU AT,R0,1000
ADDIU V1,R0,0400
SW AT,0000 (V0)
SW V1,0004 (V0)
LUI AT,0010
ADDIU AT,AT,FFFF
SW AT,000C (V0)

To read controller input you have to write a request to the PIF using its own command set.  PIFRAM is located at BFC007C0 through BFC00800.  The last byte, when set to 1, "sends" the data.  Typically you write the commands to a buffer, copy this to pifram, then DMA the result back to rdram.  The SI is exclusively used for PIF actions.
Commands are three bytes each plus any additional data they might use.  00 can be used to ignore a channel, 0xFE ends the list, and 0xFF is a NOP you can use to align commands in a prettier way.  First byte indicates #bytes you're sending, second byte #bytes you're reading, and third is the command.  The second byte also will have error codes if something is wrong; 0x80 flag if a device isn't there, and 0x40 if it couldn't be read.
There's a total of six possible devices in the slots, each on a channel.  Each subsequent command checks the next channel, so the first ctrl read will be for slot 1, second for slot 2, etc.  Internally eeprom (not SRAM or FLASH) are on channel 5.  When reading or writing eeprom you can ignore the controller channels by using 00.00.00.00.
Reading the control pak (or anything in the slot) requires you to calc a crc value, and these also have interfaces at different addresses.
Code: [Select]
Status 01.03.00.xxxxxx
FFFF00 controller type
0001 absolute (joypads, keyboards, etc.)
0002 relative (mouse)
0004 joyport
0100 VRU (japanese one, USA one may be different)
1000 internal clock
4000 16k eeprom
8000 4k eeprom
000080 eeprom busy
000004 address crc error (for controller port)
000002 controller port empty
000001 controller port filled
Usually best to start by finding out what devices are attached before dealing with input.  Input is returned as a bitfield for buttons and two axis byte for the control stick.  The control stick is designed to have a certain degree of play to it, so filter out low values (<5) from 0.  Negative is left/down, positive right/up.  Info here is for standard controllers; other devices differ.
Code: [Select]
Read Controller 01.04.01.xxxxxxxx
8000.00.00 A button
4000.00.00 B button
2000.00.00 Z button
1000.00.00 Start button
0800.00.00 + up
0400.00.00 + down
0200.00.00 + left
0100.00.00 + right
0020.00.00 L button
0010.00.00 R button
0008.00.00 C up
0004.00.00 C down
0002.00.00 C left
0001.00.00 C right
0000.FF.00 left/right axis
0000.00.FF up/down axis
Controller state can be reset with the command:
Code: [Select]
Reset 01.03.FF.xxxxxx
same as Status
Eeprom is accessed the same way regardless its size.  Offsets are in DWs, reading and writing 8 bytes at a time.  Remember eeprom commands don't work on the controller channels.
Code: [Select]
read eeprom 02.08.04.xx.xxxxxxxxxxxxxxxx
FF.0000000000000000 bank offset; >>8 of actual offset in file
00.FFFFFFFFFFFFFFFF data

write eeprom 0A.01.05.xxxxxxxxxxxxxxxxxx.xx
FF0000000000000000.00 bank offset
00FFFFFFFFFFFFFFFF.00 data
000000000000000000.FF error byte; 0 if okay

The RSP is a subprocessor that runs a program you send to it, typically either generating PCM audio or RDP command lists for generating a scene.  In actual fact you don't need either in order to fill a framebuffer or push audio.  If you're interested in how to code for it look at krom's RSP and RDP demos.  I don't suggest it at all for a novice to the system.
« Last Edit: September 20, 2015, 03:28:47 pm by Zoinkity »

diaspora

  • Jr. Member
  • **
  • Posts: 9
    • View Profile
Re: How to make a minimal N64 ROM?
« Reply #3 on: September 21, 2015, 05:58:53 pm »
Thanks for the very detailed reply. :)

In all but one special case game code starts at 0x1000 in the ROM.  At that point you do have to set a flag at the end of pifram to prevent the PIF going into an infinite loop.

How do I do this, and when exactly do I have to do it?

Doing things:
I'm going to assume interrupts are off for this.  If they're on you need to set up the exception vectors.  These are:
Code: [Select]
80000000 bad virtual address
80000080 bad 64bit virtual address, which you can't throw since locked in 32bit address mode
80000100 cache miss
80000180 general exception vector
Those addresses are not suggestions!  In general all will be jumps to the general exception handler, which you'll get to write for yourself if you aren't using library code.  You'll have to test for flags when interrupts are thrown, and they're thrown when a request is completed.[/b]
So when an exception occurs and interrupts are enabled, the CPU will jump to one of those four addresses, where I have to insert instructions to jump to my exception handler? Also, why is there an exception vector for cache misses... do cache misses have to be handled in software?

Zoinkity

  • Hero Member
  • *****
  • Posts: 565
    • View Profile
Re: How to make a minimal N64 ROM?
« Reply #4 on: September 21, 2015, 07:28:35 pm »
The last word (or byte?) of PIFram is a command byte.  You want to read the value there (wait for the PI to be ready, LW BFC007FC), OR the value with 8, then write it back (wait for the PI again, SW BFC007FC).  When reading and writing words to hardware addresses use the uncached address (addr | 0xA0000000) but when doing DMA use the address itself (1FC007FC in this case).
Libraries wait until you've initialized working memory, set the initial stack pointer, and set Status to desired settings before doing this, but you can do it as soon as you want.  Forget how many msec of time you have before you absolutely have to.

All cache errors are handled in software, just like all other exceptions.  Minimum, you should clear the flags and return from the error.  The only exception you won't ever throw is the 64bit vaddr miss since the system is locked in 32bit address space. 

If you don't use the libs there's a number of things you may want to do that normally would be done for you. 
For instance, set the 0x20000000 flag in COP0 Status so you can use COP1 for floating-point math.  If you don't intend to ever use it you can leave it off.  FPU reg 31 can be used to set COP1 Status, which allows you to control how errors are handled and which are caught.
If you ever intend to use the TLB it needs to be initialized.  That just involves writing 0 to all the entries.
They'll check if the NMI flag at 8000030C was set by the bootstrap to zero.  If so they'll initialize the NMI buffer, a 0x40 byte buffer at 8000031C.  This is a convenient place to put values so you can restore things when the player presses reset. 
You'll probably want to write CACHE commands to set writeback invalidate for your code.

If you were to poke any typical ROM you'll see beyond this basic initialization they jump straight into setting up message queues, threading, memory managers, event handlers, etc.  Without a threaded model you're going to be doing a lot of your own management.