News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Making analogies between features of high level languages and assembly languages  (Read 874 times)

MysticLord

  • Jr. Member
  • **
  • Posts: 67
    • View Profile
I have had a hard time understanding MIPS r3000 for the past 4 years or so. What I grasped are things that have direct analogies to high level languages.

Branching is an analogue to if/else-if/else and case-switch conditionals.

Jump-And-Link (jal) is a subroutine call.

I also don't understand the incremental nature of MIPS opcodes. Is there a quick-and-dirty guide to conceptually breaking down chunks of assembly into C? Like assignments, pointer/array management, loops, and so on?

What analogies helped you better understand your preferred assembly languages? Has anyone written a book which contains exhaustive comparisons between a higher level language like C and a given assembly language?

I'm not looking for a perfect guide so much as a "good enough" solution (or "a rough and aproximate conceptual framework", if you want to use big words my caveman brain has trouble with) to deal with 90% of what I'll encounter when working with MIPS.

Feel free to discuss the same topic, as it applies to assembly languages you use. Watching people smarter than myself discuss things provides many insights I would never otherwise come across.

FAST6191

  • Hero Member
  • *****
  • Posts: 2962
    • View Profile
Is this an example of "everybody programs really well in their first computer language, and just makes all the others they learn fit that paradigm"? Equally I know modern flavours of C have any number of bells and whistles, and nice standard libraries, but it was often, though jokingly at points, referred to as "portable assembler" and it still applies. Hopefully said 4 years were also not spent staring at a CPU description document and hoping it leaps into your head to one day suddenly mean you have a full understanding.

https://stuff.pypt.lt/ggt80x86a/asm1.htm does pretty well at building up a practical end result from basic instructions.
If you get on well with that then https://www.youtube.com/watch?v=hE7l6Adoiiw&list=PL6B940F08B9773B9F&index=1 is also something that might work for you. Don't have to do the whole series (and he has a nice followup one as well) but if you already have pointers on lock from C in general then by the time they are covering those you probably have most of what you want from that.
To round out the list then http://www.plantation-productions.com/Webster/ is also for X86.

Anyway I can't say I have ever really had that problem. However I mostly came from an electronics background (or at least had one) so the "everything is adding" (adding it adding, subtraction is a type of adding, multiplication is long form adding, division is subtraction if you use logarithms which is a type of adding, comparisons are adding) and building up from basic logic steps is nothing unusual (build any gate from a NAND gate collection, build an adding machine, a flip flop, a shift register, a set of shift registers, if you know C then I assume you can skip all the ceiling/floor/float/shift/rotate/signed... stuff, hope you also have maths as far as log tables and trigonometry tables). Whether then you want to do a little crash course in digital electronics and digital logic I don't know but it usually puts people in reasonable stead, especially when it comes time to start playing with extra hardware on carts/addons/the system in question that might be rather basic in what it does but still digital electronics.

Most high level languages are working around the limitations of using bits to store information in what is anything but a binary universe, and abstracting that away from people to free them up to think about bigger things. CPUs feeding back into that as newer classes of information processing come into vogue to have in silicon rather than burning hundreds of instructions in what might be done in one, or indeed have hundreds of effectively identical operations done in one (SIMD = single instruction multiple data after all).


Analogies is not usually how I would view things for assembly either. It is not without merit (decompilation is sure to be really fun over the coming decades) but instead in terms of processes/systems in the classical machine (car has drive system, power system, breaking system, control system... and they interact and influence each other at these points) or biology sense of the term. Usually then maths, housekeeping aka basic IO (fiddling with registers, flags, memory, states and what have you, get a bit further* and you can also wind in a bit of security and operating systems**) and program flow.

*not that all but the most recent consoles really have it.

**it cuts the other way as well. Have to do the whole push and pop routine to do a function or something and you will understand quickly why you are told it is bad form to call a subroutine within a subroutine.

Maths then being one of the big ones and that usually just being some CPU designer then having to do every case of an add, sub, divide, multiply... operation for every length of signed, float, fixed point, boolean, simple flag or whatever that they figured was necessary and within their transistor budget. The 6502 peeps scare me with having registers be unique special things rather than just a list of equally viable options for most purposes that I view ARM and X86 as but in the end there are so few in 6502 that it makes some sense to do it that way, and underneath it all I know if I looked at transistors then I would see that.
Some also find it valuable to figure out what is missing (ARM that the GBA and DS have misses divide, both do poorly for anything floating point). Some also reckon more limitations is good, and it is not a position without merit but I would sooner train my mind to avoid doing certain actions on certain systems than have to learn to remember to use them as they are available on another.

Spend some time playing hacker for games just either hardcoding cheats or tweaking things slightly so -- cause it to miss a calculation (congratulations there is no more stupid random luck variable in what might otherwise have been a pure game), tweak some maths just slightly so, add on something a bit more random to something... and most of the rest will fall in place.

To answer some of the questions directly.
I am not sure what you mean by incremental. It has been a while since I delved into MIPS but never noted anything strange there and a quick scan said nothing I would particularly note as incremental here.
Quick and dirty guide... that depends more how well you know C and can guess what something is attempting to do and fit it accordingly, possibly tempered further with knowledge of how coding played out on the system in question and was taught back in the day. Sometimes you can augment it a bit with the output of a dynamic recompiler but eh.
Book of comparisons... no and I doubt there ever will be one. At very best some phd doing a decompiler might have done something exhaustive for a given piece of code or standard library from a given version of a given compiler (or maybe two) and that is more for those writing a decompiler than anything like you might want.


KingMike

  • Forum Moderator
  • Hero Member
  • *****
  • Posts: 7037
  • *sigh* A changed avatar. Big deal.
    • View Profile
I just remember looking at MIPS for a short time and thinking they may have OVER-simplified the instruction set.
Only spent a short time trying to disassemble out a block of code for one PS1 game I was looking at, and it seemed they needed a whole bunch of code to do some mildly-complex instruction (was it like comparisons or something? I can't remember precisely what it was. Just some concept almost any 8/16 bit CPU could do natively.)
"My watch says 30 chickens" Google, 2018

MysticLord

  • Jr. Member
  • **
  • Posts: 67
    • View Profile
FAST6191, would you recommend nand2tetris to get the "everything is addition" perspective?

https://www.nand2tetris.org/

tl;dr
Start with logic gates and move up the abstraction layers to C.

[Unknown]

  • Jr. Member
  • **
  • Posts: 37
    • View Profile
    • PPSSPP
I also don't understand the incremental nature of MIPS opcodes. Is there a quick-and-dirty guide to conceptually breaking down chunks of assembly into C? Like assignments, pointer/array management, loops, and so on?

Outside delay slots, there are a lot of things that map very simply.

Code: [Select]
ori v0,$0,0x1234

v0 = 0x1234;

A lot of code will be like this.  It's true for MIPS, ARM, x86, anything.  Some tricks you might see are "magic" multiplies, like can be seen here:

http://aggregate.org/MAGIC/

For example, this:
http://aggregate.org/MAGIC/#Integer%20Constant%20Multiply

It's good to understand the basics of binary.  But sometimes, you might see:

Code: [Select]
sll v0,v1,1
add v0,v0,v1

This is actually equivalent to v0 = v1 * 3.  In other words, v0 = (v1 * 2) + v1.  That's quicker than a multiply, so the compiler may do it to save time.

If you're trying to read assembly, I recommend breaking it up first by branches.  As you mentioned, this will help you isolate the parts inside ifs, etc.  Add a blank line before every branch target, and after every branch.  Once you do this, you are likely to see clearer patterns in the chunks that are not separated.  In compiler speak, these are called "basic blocks".
https://en.wikipedia.org/wiki/Basic_block

Another thing that is harder is memory/pointer references and structs.  For example, you might see:

Code: [Select]
sll v1,v1,4
addu t0,v0,v1
lw t1,4(t0)

What this would equate to is similar to:

Code: [Select]
struct Something {
    int unk00;
    int unk04;
    int unk08;
    int unk0c;
};

Something *v0;

t1 = v0[v1].unk04;

Because Something, in my example, is 16 bytes long - you see it adding v1 * 16 to v0.  But in your mind, you're just offsetting the array by v1.  Assembly much more often deals with the address directly, and MIPS doesn't have the complex displacements that x86 has.

I think the important things to wrap your head around are:
 * Variable/register "renaming" (mapping of variables to registers, not always 1:1.)
 * Memory addresses and pointer offsets / array indexing.
 * Bit shifts and binary/hex math.

-[Unknown]

STARWIN

  • Sr. Member
  • ****
  • Posts: 452
    • View Profile
Don't think about abstraction layers, those are just fancy words for categorizing useless concepts. In addition, while C may seem simple and asm hard initially, mastering C is much much harder as it has too many rules and concepts when compared to many kinds of asm.

What puzzles me is what it exactly is that you don't understand about mips asm, because asm generally works so that nearly all instructions are easy to understand on their own, in isolation. Generally the hard part in reverse-engineering is that there's too much asm to read in any reasonable amount of time, and that there are no comments or names for anything. That's why a debugger is so important, as it lets you examine RAM state so that you can "name" memory locations in your own documentation of the effort.

[Unknown] provided a fairly good view of many things you'll see when reading/reverse-engineering mips asm. I could add that you can think of all registers and all memory locations as variables. They may not be variables that the C (or whatever language) source had visible, but when you make the assignments in the order the code tells you, you basically have the algorithm there. Basic math, assignments and branches. Just translate one line at a time to pseudocode and it should work, for branches it's often natural to get a bit shorter pseudocode (and you'll naturally compress your pseudocode for better readibility. eg r1=r1+5; r1=r1*2; can be solved in reverse to r1=(r1+5)*2;)