Sprite assembly works based off of what's stored in VRAM, so in order to properly display the enemy sprite, you'd have to actually have their VRAM pointers and stuff located as well (Which is a bit annoying but it works). VRAM pointers tend to have a pointer to the graphics and then how much data to transfer. I believe the max each sprite can have is THREE sets of data so it can get rather confusing and annoying.
X/Zero sprites are the worst in this case since they're dynamic. Enemy sprites, however, are just flat out loading a sheet (Which basically since most are compressed, you're good to go with loading the graphics and then loading the sprite assembly)
Sprite Assembly is set per frame. The first byte it uses is the total amount of pieces there are and what direction the sprite will face (+20 to make it use 16x16 tiles, +40 to dictate flipping). The following pieces are: X coordinate, Y coordinate and graphic to use from VRAM.