We know that we can copy one structure to another directly by assignment.
struct STR
{
int a;
int b;
};
struct STR s1 = {4, 5};
struct STR s2;
Method 1:
s2 = s1;
will assign s1 to s2.
Method 2:
s2.a = s1.a;
s2.b = s1.b;
In terms of time efficiency, which method is the faster one? Or both would take same time for the operation. Consider in the aspects of big structure in data handling!
Basically you cannot tell that, because it depends on the compiler, the target architecture, etc...
However, with modern C compilers, with optimization enabled they will be usually the same. For example, recent GCC on x86-64 generates exactly the same code for the two:
void a1()
{
s2 = s1;
}
void a2()
{
s2.a = s1.a;
s2.b = s1.b;
}
Produces:
a1():
mov rax, QWORD PTR s1[rip]
mov QWORD PTR s2[rip], rax
ret
a2():
mov rax, QWORD PTR s1[rip]
mov QWORD PTR s2[rip], rax
ret
Related
I have simple question that I just wasn't sure about.
Consider the code below:
#include <stdio.h>
static void turnOn(int *power);
static void turnOff(int *power);
int main(void)
{
int powerIsOn = 0;
turnOn(&powerIsOn);
printf("Power Status: %d\n", powerIsOn);
turnOff(&powerIsOn);
printf("Power Status: %d\n", powerIsOn);
return 0;
}
static void turnOn(int *power)
{
if (!*power)
*power = 1;
// Or
//*power = 1;
return;
}
static void turnOff(int *power)
{
if (*power)
*power = 0;
// Or
// *power = 0;
return;
}
I know that this wouldn't cause a noticeable difference in something this small. But In methods that do some sort of assignment, is it more efficient to check if a Boolean or whatever is already true/false before re-assigning it's value?
For example, the turnOn() function is set to only turn the power on if it is off. Would it be any slower or faster just to set it to 1 regardless of the value?
Thanks for your time.
In any case your code accessing the memory, it does inside of "if" and it does when you assign it to 1. In addition "if" statement adds few more lines to the binary code, so if you can just assign without using "if", it is better and more efficient.
if (!*power) #one memory access and addition if actions
*power = 1; #one more memory access and assignment
Looking at the compiled assembly code for
static void turnOn(int *power)
{
if (!*power)
*power = 1;
return;
}
We will see the next code
turnOn:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
test eax, eax
jne .L4
mov rax, QWORD PTR [rbp-8]
mov DWORD PTR [rax], 1
nop
.L4:
nop
pop rbp
ret
And for:
static void turnOn(int *power)
{
*power = 1;
return;
}
Next code:
turnOn:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov DWORD PTR [rax], 1
nop
pop rbp
ret
It seem to me that the machine will run more operations in the first case.
I was using the https://godbolt.org/ compiler.
Both operations involve accessing variables and therefore depend on memory management. Assignment involves re-writing to memory, and therefore is more "expensive" than comparing constant values (booleans, or 1 and 0).
With that being said, with modern hardware these differences are negligible and therefore considered micro-optimizations, which aren't recommended.
I think that can only be answered profiling the code in a specific scenario.
Reading a variable in C can be more efficient than assigning a value to it if the compiler generates code that reads the value directly (however it is depends on the implementation of a given compiler).
However to determine whether it executes faster or not it is circumstantial e.g. if the both functions are called sequentially a lot of times then the additional if statement you created is useless as the condition will always return true and will generate additional instructions (and by that the CPU will spend additional clocks executing them) to perform a check that is always returning true. I would say it really need to be profiled and is context-dependent.
Here is the explanation of what it means 6.7.6.3/7:
If the keyword static also appears within the [ and ] of the array
type derivation, then for each call to the function, the value of the
corresponding actual argument shall provide access to the first
element of an array with at least as many elements as specified by the
size expression.
It is not quite clear what it means. I ran the following example:
main.c
#include "func.h"
int main(void){
char test[4] = "123";
printf("%c\n", test_func(2, test));
}
And 2 different implementations of the test_func:
static version
func.h
char test_func(size_t idx, const char[const static 4]);
func.c
char test_func(size_t idx, const char arr[const static 4]){
return arr[idx];
}
non-static version
func.h
char test_func(size_t idx, const char[const 4]);
func.c
char test_func(size_t idx, const char arr[const 4]){
return arr[idx];
}
I checked the assembly code compiled with gcc 7.4.0 -O3 of the function in both of the cases and it turned out to be completely identical:
Disassembly of the functions
(gdb) disas main
sub rsp,0x18
mov edi,0x2
lea rsi,[rsp+0x4]
mov DWORD PTR [rsp+0x4],0x333231
mov rax,QWORD PTR fs:0x28
mov QWORD PTR [rsp+0x8],rax
xor eax,eax
call 0x740 <test_func>
[...]
(gdb) disas test_func
movzx eax,BYTE PTR [rsi+rdi*1]
ret
Can you give an example where the static keyword gives some benefits (or any differences at all) comparing to non-static counterpart?
Here is an example where static actually makes a difference:
unsigned foo(unsigned a[2])
{
return a[0] ? a[0] * a[1] : 0;
}
clang (for x86-64, with -O3) compiles this to
foo:
mov eax, dword ptr [rdi]
test eax, eax
je .LBB0_1
imul eax, dword ptr [rdi + 4]
ret
.LBB0_1:
xor eax, eax
ret
But after replacing the function parameter with unsigned a[static 2], the result is simply
foo:
mov eax, dword ptr [rdi + 4]
imul eax, dword ptr [rdi]
ret
The conditional branch is not necessary because a[0] * a[1] evaluates to the correct result whether a[0] is zero or not. But without the static keyword, the compiler cannot assume that a[1] can be accessed, and thus has to check a[0].
Currently only clang does this optimization; ICC and gcc produce the same code in both cases.
This isn't used much by compilers in my experience, but one use is that the compiler can assume that the (array decayed into pointer) parameter is not NULL.
Given this function, both gcc and clang (x86) produce identical machine code at -O3:
int func (int a[2])
{
if(a)
return 1;
return 0;
}
Disassembly:
func:
xor eax, eax
test rdi, rdi
setne al
ret
When changing the parameter to int a[static 2], gcc gives the same output as before, but clang does a better job:
func:
mov eax, 1
ret
Since clang realizes that a can never be NULL, so it can skip the check.
this is my first question, because I couldn't find anything related to this topic.
Recently, while making a class for my C game engine project I've found something interesting:
struct Stack *S1 = new(Stack);
struct Stack *S2 = new(Stack);
S1->bPush(S1, 1, 2); //at this point
bPush is a function pointer in the structure.
So I wondered, what does operator -> in that case, and I've discovered:
mov r8b,2 ; a char, written to a low point of register r8
mov dl,1 ; also a char, but to d this time
mov rcx,qword ptr [S1] ; this is the 1st parameter of function
mov rax,qword ptr [S1] ; !Why cannot I use this one?
call qword ptr [rax+1A0h] ; pointer call
so I assume -> writes an object pointer to rcx, and I'd like to use it in functions (methods they shall be). So the question is, how can I do something alike
push rcx
// do other call vars
pop rcx
mov qword ptr [this], rcx
before it starts writing other variables of the function. Something with preprocessor?
It looks like you'd have an easier time (and get asm that's the same or more efficient) if you wrote in C++ so you could use language built-in support for virtual functions, and for running constructors on initialization. Not to mention not having to manually run destructors. You wouldn't need your struct Class hack.
I'd like to implicitly pass *this pointer, because as shown in second asm part it does the same thing twice, yes, it is what I'm looking for, bPush is a part of a struct and it cannot be called from outside, but I have to pass the pointer S1, which it already has.
You get inefficient asm because you disabled optimization.
MSVC -O2 or -Ox doesn't reload the static pointer twice. It does waste a mov instruction copying between registers, but if you want better asm use a better compiler (like gcc or clang).
The oldest MSVC on the Godbolt compiler explorer is CL19.0 from MSVC 2015, which compiles this source
struct Stack {
int stuff[4];
void (*bPush)(struct Stack*, unsigned char value, unsigned char length);
};
struct Stack *const S1 = new(Stack);
int foo(){
S1->bPush(S1, 1, 2);
//S1->bPush(S1, 1, 2);
return 0; // prevent tailcall optimization
}
into this asm (Godbolt)
# MSVC 2015 -O2
int foo(void) PROC ; foo, COMDAT
$LN4:
sub rsp, 40 ; 00000028H
mov rax, QWORD PTR Stack * __ptr64 __ptr64 S1
mov r8b, 2
mov dl, 1
mov rcx, rax ;; copy RAX to the arg-passing register
call QWORD PTR [rax+16]
xor eax, eax
add rsp, 40 ; 00000028H
ret 0
int foo(void) ENDP ; foo
(I compiled in C++ mode so I could write S1 = new(Stack) without having to copy your github code, and write it at global scope with a non-constant initializer.)
Clang7.0 -O3 loads into RCX straight away:
# clang -O3
foo():
sub rsp, 40
mov rcx, qword ptr [rip + S1]
mov dl, 1
mov r8b, 2
call qword ptr [rcx + 16] # uses the arg-passing register
xor eax, eax
add rsp, 40
ret
Strangely, clang only decides to use low-byte registers when targeting the Windows ABI with __attribute__((ms_abi)). It uses mov esi, 1 to avoid false dependencies when targeting its default Linux calling convention, not mov sil, 1.
Or if you are using optimization, then it's because even older MSVC is even worse. In that case you probably can't do anything in the C source to fix it, although you might try using a struct Stack *p = S1 local variable to hand-hold the compiler into loading it into a register once and reusing it from there.)
mov rax,QWORD PTR [rbp-0x10]
mov eax,DWORD PTR [rax]
add eax,0x1
mov DWORD PTR [rbp-0x14], eax
Next lines written in C, compiled with GCC in GNU/Linux environment.
Assembly code is for int b = *a + 1;.
...
int a = 5;
int* ptr = &a;
int b = *a + 1;
dereferencing whats in address of a and adding 1 to that. After that, store under new variable.
What I don`t understand is second line in that assembly code. Does it mean that I cut QWORD to get the DWORD(one part of QWORD) and storing that into eax?
Since the code is few lines long, I would love that to be broke into step by step sections just to confirm that I`m on right track, also, to figure out what that second line does. Thank you.
What I don`t understand is second line in that assembly code. Does it mean that I cut QWORD to get the DWORD(one part of QWORD) and storing that into eax?
No, the 2nd line dereferences it. There's no splitting up of a qword into two dword halves. (Writing EAX zeros the upper 32 bits of RAX).
It just happens to use the same register that it was using for the pointer, because it doesn't need the pointer anymore.
Compile with optimizations enabled; it's much easier to see what's happening if gcc isn't storing/reloading all the time. (How to remove "noise" from GCC/clang assembly output?)
int foo(int *ptr) {
return *ptr + 1;
}
mov eax, DWORD PTR [rdi]
add eax, 1
ret
(On Godbolt)
int a = 5;
int* ptr = &a;
int b = *a + 1;
your example is an undefined behaviour as you dereference the integer value converted to the pointer (in this case 5) and it will not compile at all as this conversion has the unknown type.
To make it work you need to cast it first.
`int b = *(int *)a + 1;
https://godbolt.org/g/Yo8dd1
Explanation of your assembly code:
line 1: loads rax with the value of a (in this case 5)
line 2: dereferences this value (reads from the address 5 so probably you will get the segmentation fault). this code loads from the stack only because you use the -O0 option.
When copying between two structure variables in C, in the back-end whether it does a memcpy or an item by item copy? Can this be compiler depended?
It's heavily compiler dependant
Consider a struct with just 2 fields
struct A { int a, b; };
Copying this struct in VS2015 in DEBUG build generates the following asm.
struct A b;
b = a;
mov eax,dword ptr [a]
mov dword ptr [b],eax
mov ecx,dword ptr [ebp-8]
mov dword ptr [ebp-18h],ecx
Now added an array of 100 char and then copy that
struct A
{
int a;
int b;
char x[100];
};
struct A a = { 1,2, {'1', '2'} };
struct A b;
b = a;
mov ecx,1Bh
lea esi,[a]
lea edi,[b]
rep movs dword ptr es:[edi],dword ptr [esi]
Now basically a memcpy is done from address of a to address of b.
It depends on a lot of the layout of the struct, the compiler, the level of optimization...a lot of factors.
You should not even think about that. Compilers are only required that the observable results of what they generate are the same as would you asked. Besides that, they can optimize the way they like. That means that you should let the compiler choose the way it copy structs.
The only case when the above rule does not apply it in case of low level optimization. But here other rules apply:
never use low level optimization at early development stages
only do after identifying by profiling the bottlenecks in your code
always use benchmarking to choose the best way
remember that such low level optimization only make sense for one (version of) compiler on one architecture.