Cutting QWORD to get DWORD and performing calculations upon it - c

mov rax,QWORD PTR [rbp-0x10]
mov eax,DWORD PTR [rax]
add eax,0x1
mov DWORD PTR [rbp-0x14], eax
Next lines written in C, compiled with GCC in GNU/Linux environment.
Assembly code is for int b = *a + 1;.
...
int a = 5;
int* ptr = &a;
int b = *a + 1;
dereferencing whats in address of a and adding 1 to that. After that, store under new variable.
What I don`t understand is second line in that assembly code. Does it mean that I cut QWORD to get the DWORD(one part of QWORD) and storing that into eax?
Since the code is few lines long, I would love that to be broke into step by step sections just to confirm that I`m on right track, also, to figure out what that second line does. Thank you.

What I don`t understand is second line in that assembly code. Does it mean that I cut QWORD to get the DWORD(one part of QWORD) and storing that into eax?
No, the 2nd line dereferences it. There's no splitting up of a qword into two dword halves. (Writing EAX zeros the upper 32 bits of RAX).
It just happens to use the same register that it was using for the pointer, because it doesn't need the pointer anymore.
Compile with optimizations enabled; it's much easier to see what's happening if gcc isn't storing/reloading all the time. (How to remove "noise" from GCC/clang assembly output?)
int foo(int *ptr) {
return *ptr + 1;
}
mov eax, DWORD PTR [rdi]
add eax, 1
ret
(On Godbolt)

int a = 5;
int* ptr = &a;
int b = *a + 1;
your example is an undefined behaviour as you dereference the integer value converted to the pointer (in this case 5) and it will not compile at all as this conversion has the unknown type.
To make it work you need to cast it first.
`int b = *(int *)a + 1;
https://godbolt.org/g/Yo8dd1
Explanation of your assembly code:
line 1: loads rax with the value of a (in this case 5)
line 2: dereferences this value (reads from the address 5 so probably you will get the segmentation fault). this code loads from the stack only because you use the -O0 option.

Related

What's the fastest way to copy two adjacent bytes in C?

Ok so let's start with the most obvious solution:
memcpy(Ptr, (const char[]){'a', 'b'}, 2);
There's quite an overhead of calling a library function. Compilers sometimes don't optimize it, well I wouldn't rely on compiler optimizations but even though GCC is smart, if I'm porting a program to more exotic platforms with trashy compilers I don't want to rely on it.
So now there's a more direct approach:
Ptr[0] = 'a';
Ptr[1] = 'b';
It doesn't involve any overhead of library functions, but is making two different assignments. Third we have a type pun:
*(uint16_t*)Ptr = *(uint16_t*)(unsigned char[]){'a', 'b'};
Which one should I use if in a bottleneck? What's the fastest way to copy only two bytes in C?
Regards,
Hank Sauri
Only two of the approaches you suggested are correct:
memcpy(Ptr, (const char[]){'a', 'b'}, 2);
and
Ptr[0] = 'a';
Ptr[1] = 'b';
On X86 GCC 10.2, both compile to identical code:
mov eax, 25185
mov WORD PTR [something], ax
This is possible because of the as-if rule.
Since a good compiler could figure out that these are identical, use the one that is easier to write in your cse. If you're setting one or two bytes, use the latter, if several use the former or use a string instead of a compound literal array.
The third one you suggested
*(uint16_t*)Ptr = *(uint16_t*)(unsigned char[]){'a', 'b'};
also compiles to the same code when using x86-64 GCC 10.2, i.e. it would behave identically in this case.
But in addition it has 2-4 points of undefined behaviour, because it has twice strict aliasing violation and twice, coupled with possible unaligned memory access at both source and destination. Undefined behaviour does not mean that it must not work like you intended, but neither does it mean that it has to work as you intended. The behaviour is undefined. And it can fail to work on any processor, including x86. Why would you care about the performance on a bad compiler so much that you would write code that would fail to work on a good compiler?!
When in doubt, use the Compiler Explorer.
#include <string.h>
#include <stdint.h>
int c1(char *Ptr) {
memcpy(Ptr, (const char[]){'a', 'b'}, 2);
}
int c2(char *Ptr) {
Ptr[0] = 'a';
Ptr[1] = 'b';
}
int c3(char *Ptr) {
// Bad bad not good.
*(uint16_t*)Ptr = *(uint16_t*)(unsigned char[]){'a', 'b'};
}
compiles down to (GCC)
c1:
mov eax, 25185
mov WORD PTR [rdi], ax
ret
c2:
mov eax, 25185
mov WORD PTR [rdi], ax
ret
c3:
mov eax, 25185
mov WORD PTR [rdi], ax
ret
or (Clang)
c1: # #c1
mov word ptr [rdi], 25185
ret
c2: # #c2
mov word ptr [rdi], 25185
ret
c3: # #c3
mov word ptr [rdi], 25185
ret
in C this approach is, no doubt, the fastest:
Ptr[0] = 'a';
Ptr[1] = 'b';
This is why:
All Intel and ARM CPU's are able to store some constant data (also called immediate data) within selected assembly instructions. These instructions are memory-to-cpu and cpu-to-memory data transfer like: MOV
That means that when those instructions are fetched from the PROGRAM memory to the CPU the immediate data will arrive to the CPU along with the instruction.
'a' and 'b' are constant and therefore might enter the CPU along with the MOV instruction.
Once the immediate data is in the CPU, the CPU itself has only to make one memory access to the DATA memory for writing 'a' to Ptr[0].
Ciao,
Enrico Migliore

Value of eax register in the assembly program

I am completing an assignment related to c programming and assembly language. Here is the simple c program :
int multiply(int a, int b) {
int k = 4;
int c,d, e;
c = a*b ;
d = a*b + k*c;
return d;
}
And it's optimised assembly is
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
_multiply PROC
mov eax, DWORD PTR _a$[esp-4]
imul eax, DWORD PTR _b$[esp-4]
lea eax, DWORD PTR [eax+eax*4]
ret 0
_multiply ENDP
I want to know the value of eax register after this line of code in assembly
lea eax, DWORD PTR [eax+eax*4]
I know when add integers in assembly, it stores result in the destination. and when we multiply it stores in eax. so if I call the function multiply( 3 , 8 ), the value of eax register after that line should be 120. Am I correct?
lea is "load effective address".
Instruction sets can have some quite complex multi-register address calculation modes that are generally used just for reading and writing data to memory, but lea allows the programmer to get the address that would be accessed by the instruction.
Effectively, it performs the calculation inside the bracket, returns that value - it doesn't access the memory (which is what bracket usually implies).
In this case it is being used as a quick way to multiply by 5, because the rest of the function has been optimised away!

Is there any way to save registers before jumping into function?

this is my first question, because I couldn't find anything related to this topic.
Recently, while making a class for my C game engine project I've found something interesting:
struct Stack *S1 = new(Stack);
struct Stack *S2 = new(Stack);
S1->bPush(S1, 1, 2); //at this point
bPush is a function pointer in the structure.
So I wondered, what does operator -> in that case, and I've discovered:
mov r8b,2 ; a char, written to a low point of register r8
mov dl,1 ; also a char, but to d this time
mov rcx,qword ptr [S1] ; this is the 1st parameter of function
mov rax,qword ptr [S1] ; !Why cannot I use this one?
call qword ptr [rax+1A0h] ; pointer call
so I assume -> writes an object pointer to rcx, and I'd like to use it in functions (methods they shall be). So the question is, how can I do something alike
push rcx
// do other call vars
pop rcx
mov qword ptr [this], rcx
before it starts writing other variables of the function. Something with preprocessor?
It looks like you'd have an easier time (and get asm that's the same or more efficient) if you wrote in C++ so you could use language built-in support for virtual functions, and for running constructors on initialization. Not to mention not having to manually run destructors. You wouldn't need your struct Class hack.
I'd like to implicitly pass *this pointer, because as shown in second asm part it does the same thing twice, yes, it is what I'm looking for, bPush is a part of a struct and it cannot be called from outside, but I have to pass the pointer S1, which it already has.
You get inefficient asm because you disabled optimization.
MSVC -O2 or -Ox doesn't reload the static pointer twice. It does waste a mov instruction copying between registers, but if you want better asm use a better compiler (like gcc or clang).
The oldest MSVC on the Godbolt compiler explorer is CL19.0 from MSVC 2015, which compiles this source
struct Stack {
int stuff[4];
void (*bPush)(struct Stack*, unsigned char value, unsigned char length);
};
struct Stack *const S1 = new(Stack);
int foo(){
S1->bPush(S1, 1, 2);
//S1->bPush(S1, 1, 2);
return 0; // prevent tailcall optimization
}
into this asm (Godbolt)
# MSVC 2015 -O2
int foo(void) PROC ; foo, COMDAT
$LN4:
sub rsp, 40 ; 00000028H
mov rax, QWORD PTR Stack * __ptr64 __ptr64 S1
mov r8b, 2
mov dl, 1
mov rcx, rax ;; copy RAX to the arg-passing register
call QWORD PTR [rax+16]
xor eax, eax
add rsp, 40 ; 00000028H
ret 0
int foo(void) ENDP ; foo
(I compiled in C++ mode so I could write S1 = new(Stack) without having to copy your github code, and write it at global scope with a non-constant initializer.)
Clang7.0 -O3 loads into RCX straight away:
# clang -O3
foo():
sub rsp, 40
mov rcx, qword ptr [rip + S1]
mov dl, 1
mov r8b, 2
call qword ptr [rcx + 16] # uses the arg-passing register
xor eax, eax
add rsp, 40
ret
Strangely, clang only decides to use low-byte registers when targeting the Windows ABI with __attribute__((ms_abi)). It uses mov esi, 1 to avoid false dependencies when targeting its default Linux calling convention, not mov sil, 1.
Or if you are using optimization, then it's because even older MSVC is even worse. In that case you probably can't do anything in the C source to fix it, although you might try using a struct Stack *p = S1 local variable to hand-hold the compiler into loading it into a register once and reusing it from there.)

Video memory access and postfix incrementation

I have got some problem with memory access and postix incrementation :/
I need to access to video memory at boot, thus, I create a pointer to 0xB8000 address and then, I increment the pointer to access next location.
Basically, the code would be :
volatile char *p = (volatile char *)0xB8000;
for (int i = 0; i < 5; ++i)
*(p++) = 'A';
This way, p point to the proper memory address, and after each access, it is incremented (I know, the there is 2 bytes for each character displayed, but here is not the problem).
But this doesn't work, no character displayed. It display nothing. But if I change incrementation to prefix like this, it works, i can see the characters on the screen !
volatile char *p = (volatile char *)0xB8000;
for (int i = 0; i < 5; ++i)
*(++p) = 'A'
So, I checked assembly code :
; Postfix
mov ecx, DWORD PTR _p$[ebp]
mov BYTE PTR [ecx], 65 ; 'A' character
mov edx, DWORD PTR _p$[ebp]
add edx, 1
mov DWORD PTR _p$[ebp], edx
; Prefix
mov ecx, DWORD PTR _p$[ebp]
add ecx, 1
mov DWORD PTR _p$[ebp], ecx
mov edx, DWORD PTR _p$[ebp]
mov BYTE PTR [edx], 65 ; 'A' character
I can't spot the difference. By the way, I could use the prefix incrementation but, I would like to understand with does the postfix not work :/
The assembly code is from Visual C++ compiler, I don't have any GCC at work :/
EDIT : I know the difference between prefix and postfix incrementation, and I see the difference between assembly code present here. But IMO, none of these differences leads to non printing characters on screen.
About the attribute byte : I know I should set it properly. I would keep a light example with light assembly code, but actually, with incrementation the attribute character is always set to 'A' wich lead to a blue letter on a red background.
Thank you :)
After a few more tests, I found the possible cause of this error, it was about the .rodata section not properly linked, so it's now better.
For more details, I follow some of the instructions availables on an OSDev Tutorial ;)

Arrays pointers on 32bit and 64bit systems

The following code prints different results on 32bit and 64bit systems:
#include <stdio.h>
void swapArray(int **a, int **b)
{
int *temp = *a;
*a = *b;
*b = temp;
}
int main()
{
int a[2] = {1, 3};
int b[2] = {2, 4};
swapArray(&a, &b);
printf("%d\n", a[0]);
printf("%d\n", a[1]);
return 0;
}
After compiling it in 32bit system, the output is:
2
3
On 64bit the output is:
2
4
As I understand, the function swapArray just swaps the pointers to the first elements in a and b. So after calling swapArray, a should point to 2 and b should point to 1.
For this reason a[0] should yield 2, and a[1] should reference the next byte in memory after the location of 2, which contains 4.
Can anyone please explain?
Edit:
Thanks to the comments and answers, I now notice that &a and &b are of type int (*)[] and not int **. This obviously makes the code incorrect (and indeed I get a compiler warning). It is intriguing, though, why the compiler (gcc) just gives a warning and not an error.
I am still left with the question what causes different results on different systems, but since the code is incorrect, it is less relevant.
Edit 2:
As for the different results on different systems, I suggest reading AndreyT's comment.
swapArray(&a, &b);
&a and &b are not of type int ** but of type int (*)[2]. BTW your compiler is kind enough to accept your program but a compiler has the right to refuse to translate it.
Before answering your question lets see what happens under the hood during a pointer operation. I'm using a very simple code to demonstrate this :
#include <stdio.h>
int main() {
int *p;
int **p2;
int x = 3;
p = &x;
p2 = &p;
return 0;
}
Now look at the disassembly :
(gdb) disassemble
Dump of assembler code for function main:
0x0000000000400474 <+0>: push rbp
0x0000000000400475 <+1>: mov rbp,rsp
0x0000000000400478 <+4>: mov DWORD PTR [rbp-0x14],0x3
0x000000000040047f <+11>: lea rax,[rbp-0x14]
0x0000000000400483 <+15>: mov QWORD PTR [rbp-0x10],rax
0x0000000000400487 <+19>: lea rax,[rbp-0x10]
0x000000000040048b <+23>: mov QWORD PTR [rbp-0x8],rax
=> 0x000000000040048f <+27>: mov eax,0x0
0x0000000000400494 <+32>: leave
0x0000000000400495 <+33>: ret
The disassembly is pretty self evident. But a few note need to be added here,
My function's stack frame starts from here:
0x0000000000400474 <+0>: push rbp
0x0000000000400475 <+1>: mov rbp,rsp
So lets what they have for now
(gdb) info registers $rbp
rbp 0x7fffffffe110 0x7fffffffe110
here we are putting value 3 in [rbp - 0x14]'s address. lets see the memory map
(gdb) x/1xw $rbp - 0x14
0x7fffffffe0fc: 0x00000003
Its important to notice the DWORD datatype is used, which is a 32 bits wide. So on the side note, integer literals like 3 is treated treated as 4 bytes unit.
Next instruction uses lea to load the effective address of the value just saved in earlier instruction.
0x000000000040047f <+11>: lea rax,[rbp-0x14]
It means that now $rax will have the value 0x7fffffffe0fc.
(gdb) p/x $rax
$4 = 0x7fffffffe0fc
Next we will save this address into memory using
0x0000000000400483 <+15>: mov QWORD PTR [rbp-0x10],rax
Important thing to note that a QWORD which is used here. Because 64bit systems have 8 byte native pointer size. 0x14 - 0x10 = 4 bytes were used in earlier mov instruction.
Next we have :
0x0000000000400487 <+19>: lea rax,[rbp-0x10]
0x000000000040048b <+23>: mov QWORD PTR [rbp-0x8],rax
This is again for the second indirection. always all the value related to addresses are QWORD. This is important thing to take a note of this.
Now lets come to your code.
Before calling to swaparray you have :
=> 0x00000000004004fe <+8>: mov DWORD PTR [rbp-0x10],0x1
0x0000000000400505 <+15>: mov DWORD PTR [rbp-0xc],0x3
0x000000000040050c <+22>: mov DWORD PTR [rbp-0x20],0x2
0x0000000000400513 <+29>: mov DWORD PTR [rbp-0x1c],0x4
0x000000000040051a <+36>: lea rdx,[rbp-0x20]
0x000000000040051e <+40>: lea rax,[rbp-0x10]
0x0000000000400522 <+44>: mov rsi,rdx
0x0000000000400525 <+47>: mov rdi,rax
This is very trivial. Your array is initialized and the effect of & operator is visible when the effective address of the start of array is loaded into $rdi and $rsi.
Now lets see what its doing inside swaparray().
The start of your array is saved into $rdi and $rsi. So lets see their contents
(gdb) p/x $rdi
$2 = 0x7fffffffe100
(gdb) p/x $rsi
$3 = 0x7fffffffe0f0
0x00000000004004c8 <+4>: mov QWORD PTR [rbp-0x18],rdi
0x00000000004004cc <+8>: mov QWORD PTR [rbp-0x20],rsi
Now the first statement int *temp = *a is performed by following instructions.
0x00000000004004d0 <+12>: mov rax,QWORD PTR [rbp-0x18]
0x00000000004004d4 <+16>: mov rax,QWORD PTR [rax]
0x00000000004004d7 <+19>: mov QWORD PTR [rbp-0x8],rax
Now comes the defining moment, what's happening with your *a?
It loads into $rax the value stored in [rbp - 0x18]. where the value $rdi was saved. which in turn holds the address of the first element of the first array.
performs another indirection by using the address stored into $rax to fetch a QWARD and loads it into $rax. So what it will return? it will return a QWARD from 0x7fffffffe100. Which will in effect form a 8 byte quantity from two four byte quantity saved there. To elaborate,
The memory there is like below.
(gdb) x/2xw $rdi
0x7fffffffe100: 0x00000001 0x00000003
Now if you fetch a QWORD
(gdb) x/1xg $rdi
0x7fffffffe100: 0x0000000300000001
So already you are actually screwed. Because you are fetching with incorrect boundary.
The rest of the codes can be explained in similar manner.
Now why its different in 32 bit platform? because in 32 bit platform the native pointer width is 4 bytes. So the thing here will be different there. The main problem with your semantically incorrect code originates from the difference in integer type width and native pointer types. If you have both the same, you may still work around your code.
But you should never write code which assumes the size of native types. That's why standards are for. that's why your compiler is giving you warning.
From language point of view its a type mismatch which is already pointed out in the earlier answers so i'm not going into that.
You can't swap arrays using the pointer trick (they are not pointers!). You would either have to create pointers to those arrays and use the pointers or dynamically allocate the arrays using malloc etc.
The results I get on a 64-bit system are different than yours for example, I get:
2
3
test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped
And with clang on my mac I get an error:
test.cpp: In function ‘int main()’:
test.cpp:13: error: cannot convert ‘int (*)[2]’ to ‘int**’ for argument ‘1’ to ‘void swapArray(int**, int**)’
I assume that this is undefined behavior and you are trying to interpret what is probably junk output.

Resources