understanding the keywords eax and mov - masm

I am trying to understand the registers in asm but every website I look at just assumes I know something about registers and I just cannot get a grip on it. I know about a books worth of c++ and as far as I know mov var1,var2 would be the same thing as var1 = var2, correct?
But with the eax register I am completely lost. Any help is appreciated.

Consider registers as per-processor global variables. There's "eax", "ebx", and a bunch of others. Furthermore, you can only perform certain operations via registers - for example there's no instruction to read from one memory location and write it to another (except when the locations are denoted by certain registers - see movsb instruction, etc).
So the registers are generally used only for temporary storage of values that are needed for some operation, but they usually are not used as global variables in the conventional sense.
You are right that "mov var1, var2" is essentially an assignment - but you cannot use two memory-based variables as operands; that's not supported. You could instead do:
mov eax, var1
mov var2, eax
... which has the same effect, using the eax register as a temporary.

eax refers to a processor register (essentially a variable)
mov is an instruction to copy data from one register to another. So essentially you are correct (in a handwavey sense)
Do you have an example assembly block you want to discuss?

Think of eax as a location in memory where a value can be stored, much like in c++ where int, long,... and other types specify the size of the location in memory of a variable. The eax register simply points to a storage location in memory, which on x86 computers is 32 bits. The e part of eax means extended. This register -> memory location is automatically used by the multiplication and division operators and normally called the extended accumulator register.

Related

GCC Extended assembly pin local variable to any register except r12

Basically I am looking for a way that I pin a temporary to any register except r12.
I know I can "hint" the compiler to pin to a single register with:
// Toy example. Obviously an unbalanced `pop` in
// extended assembly will cause serious problems.
register long tmp asm("rdi"); // or just clober rdi and use it directly.
asm volatile("pop %[tmp]\n" // using pop hence don't want r12
: [tmp] "=&r" (tmp)
:
:);
and this will generally work as in avoiding r12 but might mess up the compilers register allocation elsewhere.
Is it possible to do this without forcing the compiler to use a single register?
Note that register asm doesn't truly "pin" a variable to a register, it only ensures that uses of that variable as an operand in inline asm will use that register. In principle the variable may be stored elsewhere in between. See https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Local-Register-Variables.html#Local-Register-Variables. But it sounds like all you really need is to ensure that your pop instruction doesn't use r12 as its operand, possibly because of Why is POP slow when using register R12?. I'm not aware of any way to do precisely this, but here are some options that may help.
The registers rax, rbx, rcx, rdx, rsi, rdi each have their own constraint letters, a,b,c,d,S,D respectively (the other registers don't). So you can get about halfway there by doing
long tmp;
asm volatile("pop %[tmp]\n"
: [tmp] "=&abcdSD" (tmp)
:
:);
This way the compiler has the option to choose any of those six registers, which should give the register allocator a lot more flexibility.
Another option is to declare that your asm clobbers r12, which will prevent the compiler from allocating operands there:
long tmp;
asm volatile("pop %[tmp]\n"
: [tmp] "=&r" (tmp)
:
: "r12");
The tradeoff is that it will also not use r12 to cache local variables across the asm, since it assumes that it may be modified. Hopefully it will be smart enough to just avoid using r12 in that part of the code at all, but if it can't, it may emit extra register moves or spill to the stack around your asm. Still, it's less brutal than -ffixed-r12 which would prevent the compiler from using r12 anywhere in the entire source file.
Future readers should note that in general it is unsafe to modify the stack pointer inside inline asm on x86-64. The compiler assumes that rsp isn't changed by inline asm, and it may access stack variables via effective addresses with constant offsets relative to rsp, at any time. Moreover, x86-64 uses a red zone, so even a push/pop pair is unsafe, because there may be important data stored below rsp. (And an unexpected pop may mean that other important data is no longer in the red zone and thus subject to overwriting by signal handlers.) So, you shouldn't do this unless you're willing to carefully read the generated assembly after every recompilation to make sure the compiler hasn't decided to do any of these things. (And before you ask, you cannot fix this by declaring a clobber of rsp; that's not supported.)

Segmentation fault when attempting to print int value from x86 external function [duplicate]

I've noticed that a lot of calling conventions insist that [e]bx be preserved for the callee.
Now, I can understand why they'd preserve something like [e]sp or [e]bp, since that can mess up the callee's stack. I can also understand why you might want to preserve [e]si or [e]di since that can break the callee's string instructions if they aren't particularly careful.
But [e]bx? What on earth is so important about [e]bx? What makes [e]bx so special that multiple calling conventions insist that it be preserved throughout function calls?
Is there some sort of subtle bug/gotcha that can arise from messing with [e]bx?
Does modifying [e]bx somehow have a greater impact on the callee than modifying [e]dx or [e]cx for instance?
I just don't understand why so many calling conventions single out [e]bx for preservation.
Not all registers make good candidates for preserving:
no (e)ax -- Implicitly used in some instructions; Return value
no (e)dx -- edx:eax is implicity used in cdq, div, mul and in return values
(e)bx -- generic register, usable in 16-bit addressing modes (base)
(e)cx -- shift-counts, used in loop, rep
(e)si -- movs operations, usable in 16-bit addressing modes (index)
(e)di -- movs operations, usable in 16-bit addressing modes (index)
Must (e)bp -- frame pointer, usable in 16-bit addressing modes (base)
Must (e)sp -- stack pointer, not addressable in 8086 (other than push/pop)
Looking at the table, two registers have good reason to be preserved and two have a reason not to be preserved. accumulator = (e)ax e.g. is the most often used register due to short encoding. SI,DI make a logical register pair -- on REP MOVS and other string operations, both are trashed.
In a half and half callee/caller saving paradigm the discussion would basically go only if bx/cx is preferred over si/di. In other calling conventions, it's just EDX,EAX and ECX that can be trashed.
EBX does have a few obscure implicit uses that are still relevant in modern code (e.g. CMPXGH8B / CMPXGH16B), but it's the least special register in 32/64-bit code.
EBX makes a good choice for a call-preserved register because it's rare that a function will need to save/restore EBX because they need EBX specifically, and not just any non-volatile register. As Brett Hale's answer points out, it makes EBX a great choice for the global offset table (GOT) pointer in ABIs that need one.
In 16-bit mode, addressing modes were limited to (any subset of) [BP|BX + DI|SI + disp8/disp16]), so BX is definitely special there.
This is a compromise between not saving any of the registers and saving them all. Either saving none, or saving all, could have been proposed, but either extreme leads to inefficiencies caused by copying the contents to memory (the stack). Choosing to allow some registers to be preserved and some not, reduces the average cost of a function call.
One of the main reasons, certainly for the i386 ELF ABI, is that ebx holds the address of the global offset table (GOT) register for position-independent code (PIC). See 3-35 of the specification for the details. It would be disruptive in the extreme, if, say, shared library code had to restore the GOT after every function call return.

How do make sure if a variable defined with "register" specifier got stored in CPU register?

I want to know, How do we make sure if a variable defined with register specifier got stored in CPU register?
Basically, you cannot. There is absolutely nothing in the C standard that gives you the control.
Using the register keyword is giving the compiler a hint that that the variable maybe stored into a register (i.e., allowed fastest possible access). Compiler is free to ignore it. Each compiler can have a different way of accepting/rejecting the hint.
Quoting C11, chapter §6.7.1, (emphasis mine)
A declaration of an identifier for an object with storage-class specifier register
suggests that access to the object be as fast as possible. The extent to which such
suggestions are effective is implementation-defined.
FWIW, most modern-day compilers can detect the mostly-used variables and allocate them in actual register, if required. Remember, CPU register is a scarce resource.
Disassemble the code and check. It may not really be clear at that point, because variables don't really exist, they're just names that link producers with consumers. So, there is not necessarily a register reserved for that variable - maybe it disappeared entirely, maybe it lives in several registers over its lifetime, maybe none of the above.
Historically, the register keyword was introduced decades ago as an optimization hint to the compiler. Nowadays, when the processors have more general purpose registers, the compiler usually places variables in registers even without being told so (when the code is compiled with optimizations).
Being just a hint and not an enforcement, you cannot do anything to force it. You can, however, write that part of the code in assembler. This way you have complete control of where your variables are stored.
If the variable is stored in a register means it is not stored in memory.
So, the bull's eye is try to access the address of the variable using printf. If the output gives some address, conclusion is it is stored in memory so it would act as auto storage class variable (and it is not stored in register).
But if it gives error "incompatible implicit declaration of built-in function 'printf' "..this means the variable is stored in register and would behave as register storage class variable..
Maybe calling Assembly instructions will help with it:
/// Function must be something like this:
int check_register_storing()
{
__asm__ (
pushad // Save registers
and ebx, ebx // Set Zero
and eax, eax
and ecx, ecx
and edx, edx
);
// Set test number.
register int a = 8; // Initial value;
int from_register = 0;
asm(
add eax, ebx // If, 'a' variable set on CPU register,
add eax, ecx // Some of main usage registers must contain 8
add eax, edx // Others must contain 0
mov %from_register, eax
popad // Return default parameters to registers
}
/// Check result
printf( "Original saved number: %d, Returned number from main registers: %d\n", a, from_register );
}
I don't know if I am wrong or right, but we know that a normal variable is stored in memory which has address, but we know that if we write register int a; then a register may be allocated for the variable, but we know that registers have names, not address, so we can't assign pointer to point to a registers because pointers stores address only, so if we write as follow:-
#include<stdio.h>
int main()
{
register int reg = 5;
int *p = ®
printf("%d",reg);
}
then it should give error as: "address of register variable ‘reg’ requested" if the register has successfully allocated to our variable, and if register is not allocated then memory addres can be assigned to pointer hence no error should be there.
IMPORTANT:-This is my first answer on stackoverflow, I may be wrong, please correct me if i am, I'm still learning.

How to index arrays properly in x86 assembly

I am trying to make sure that I understand the SI and DI registers. My background in assembly language is somewhat limited to 6502, so bear with me here.
I have a quick example of how I would go about using SI as a simple counter. I am a bit concerned that I might be misusing this register though.
mov si, 0 ; set si to 0
mov cx, 5 ; set cx to 5 as we will count down to 1
do:
mov ah, 02h ; setup 02h DOS character output interrupt
mov dl, [table + si] ; grab our table with the si offset
add dl, '0' ; convert to ascii integer
int 21h ; call DOS service
inc si ; increment si
loop do ; repeat unto cx = 0
ret
table: db 1,2,3,4,5
---
OUTPUT:> 12345
Is this the right way to use SI? I know in 6502 assembly, you can use the X and Y registers to offset arrays / tables. However, in my studies of x86, I am starting to realize how much more there is to work with. Such as how CX is automatically decremented in the 'loop' instruction.
I am hoping that moving forward, I will be able to save resources by writing efficient code.
Thank you in advance for your input.
This use of SI is perfectly fine. SI has the benefit of being a preserved register in most Intel calling conventions. Also, historically, SI was one of the few registers that you could use as an index in a memory load operation; in a modern Intel CPU, any register would do.
SI still gets some special treatment with the lods instruction.
Your program actually works fine. Adding org $100 at the beginning, I managed to compile it with FASM and run in DosBox:
On the 6502 you have two index registers (X and Y) that you can use in different ways (direct, indirect, indirect indexed, indexed indirect, ...).
On the x86 you have 4 registers that can be used as pointer registers: BX, BP, SI and DI (in 32-bit mode you can use nearly all registers)
BX and DI can be combined (Example: [BX+DI+10])
BP is typically used for storing the old stack pointer when entering a function (when using a C compiler). However there is no missuse of registers (unless you use the stack pointer for something different) when you program in assembler. You cannot do anything wrong!
But be careful: On the x86 (in 16-bit mode) you also have to care about the segment registers - this is what the 6502 does not have!
These registers are needed because you can only address 64 KiB using a 16-bit register but 8086 has an 1 MiB address space. To solve this an address is composed of a 16-bit segment and a 16-bit offset so an address is effectively not 16 but 32 bits long. The exact meaning of the first 16 bits depends on the operating mode of the CPU.
The following segment registers are present:
CS: CS:IP is the instruction pointer
SS: SS:SP is the stack pointer; used for SP and BP pointer operations by default
DS: Used for all other pointer operations (all but SP and BP) by default
ES: Additional register
FS, GS: Additional registers since 80386
You can overwrite the default segment register to be used:
MOV AX,ES:[SI+100] ; Load from ES:SI+100 instead of DS:SI+100
String operations (like movsb) always access DS:SI and ES:DI (you cannot change the segment register for such operations).
That's an alright use of SI. But you could use several other registers in its base (although beware that unlike 32-bit x86, 16-bit x86 code limits the set of registers on which indexing is supported. The ModRegR/M structure governs this.)
You might want to consider doing an add si, table before the loop and mov dl, [si] inside it. It makes the loop slightly easier for the human to read, because there's one less variable in play.

Compare and swap in machine code in C

How would you write a function in C which does an atomic compare and swap on an integer value, using embedded machine code (assuming, say, x86 architecture)? Can it be any more specific if its written only for the i7 processor?
Does the translation act as a memory fence, or does it just ensure ordering relation just on that memory location included in the compare and swap? How costly is it compared to a memory fence?
Thank you.
The easiest way to do it is probably with a compiler intrinsic like _InterlockedCompareExchange(). It looks like a function but is actually a special case in the compiler that boils down to a single machine op. In the case of the MSVC x86 intrinsic, that works as a read/write fence as well, but that's not necessarily true on other platforms. (For example, on the PowerPC, you'd need to explicitly issue a lwsync to fence memory reordering.)
In general, on many common systems, a compare-and-swap operation usually only enforces an atomic transaction upon the one address it's touching. Other memory access can be reordered, and in multicore systems, memory addresses other than the one you've swapped may not be coherent between the cores.
You can use the CMPXCHG instruction with the LOCK prefix for atomic execution.
E.g.
lock cmpxchg DWORD PTR [ebx], edx
or
lock cmpxchgl %edx, (%ebx)
This compares the value in the EAX register with the value at the address stored in the EBX register and stores the value in the EDX register to that location if they are the same, otherwise it loads the value at the address stored in the EBX register into EAX.
You need to have a 486 or later for this instruction to be available.
If your integer value is 64 bit than use cmpxchg8b 8 byte compare and exchange under IA32 x86.
Variable must be 8 byte aligned.
Example:
mov eax, OldDataA //load Old first 32 bits
mov edx, OldDataB //load Old second 32 bits
mov ebx, NewDataA //load first 32 bits
mov ecx, NewDataB //load second 32 bits
mov edi, Destination //load destination pointer
lock cmpxchg8b qword ptr [edi]
setz al //if transfer is succesful the al is 1 else 0
If the LOCK prefix is omitted in atomic processor instructions, atomic operation across multiprocessor environment will not be guaranteed.
In a multiprocessor environment, the LOCK# signal ensures that the processor has exclusive use of any shared memory while the signal is asserted. Intel Instruction Set Reference
Without LOCK prefix the operation will guarantee not being interrupted by any event (interrupt) on current processor/core only.
It's interesting to note that some processors don't provide a compare-exchange, but instead provide some other instructions ("Load Linked" and "Conditional Store") that can be used to synthesize the unfortunately-named compare-and-swap (the name sounds like it should be similar to "compare-exchange" but should really be called "compare-and-store" since it does the comparison, stores if the value matches, and indicates whether the value matched and the store was performed). The instructions cannot synthesize compare-exchange semantics (which provides the value that was read in case the compare failed), but may in some cases avoid the ABA problem which is present with Compare-Exchange. Many algorithms are described in terms of "CAS" operations because they can be used on both styles of CPU.
A "Load Linked" instruction tells the processor to read a memory location and watch in some way to see if it might be written. A "Conditional Store" instruction instructs the processor to write a memory location only if nothing can have written it since the last "Load Linked" operation. Note that the determination may be pessimistic; processing an interrupt, for example, may invalidate a "Load-Linked"/"Conditional Store" sequence. Likewise in a multi-processor system, an LL/CS sequence may be invalidated by another CPU accessing to a location on the same cache line as the location being watched, even if the actual location being watched wasn't touched. In typical usage, LL/CS are used very close together, with a retry loop, so that erroneous invalidations may slow things down a little but won't cause much trouble.

Resources