This question regards the difference between the volatile and extern variable and also the compiler optimization.
One extern variable defined in main file and used in one more source file, like this:
ExternTest.cpp:
short ExtGlobal;
void Fun();
int _tmain(int argc, _TCHAR* argv[])
{
ExtGlobal=1000;
while (ExtGlobal < 2000)
{
Fun();
}
return 0;
}
Source1.cpp:
extern short ExtGlobal;
void Fun()
{
ExtGlobal++;
}
The assembly generated for this in the vs2012 as below:
ExternTest.cpp assembly for accessing the external variable
ExtGlobal=1000;
013913EE mov eax,3E8h
013913F3 mov word ptr ds:[01398130h],ax
while (ExtGlobal < 2000)
013913F9 movsx eax,word ptr ds:[1398130h]
01391400 cmp eax,7D0h
01391405 jge wmain+3Eh (0139140Eh)
Source.cpp assembly for modifying the extern variable
ExtGlobal++;
0139145E mov ax,word ptr ds:[01398130h]
01391464 add ax,1
01391468 mov word ptr ds:[01398130h],ax
From the above assembly, every access to the variable "ExtGlobal" in the while loop reads the value from the corresponding address. If i add volatile to the external variable the same assembly code was generated. Volatile usage in two different threads and external variable usage in two different functions are same.
Asking about extern and volatile is like asking about peanuts and gorillas. They're completely unrelated.
extern is used simply to tell the compiler, "Hey, don't expect to find the definition of this symbol in this C file. Let the linker fix it up at the end."
volatile essentially tells the compiler, "Never trust the value of this variable. Even if you just stored a value from a register to that memory location, don't re-use the value in the register - make sure to re-read it from memory."
If you want to see that volatile causes different code to be generated, write a series of reads/writes from the variable.
For example, compiling this code in cygwin, with gcc -O1 -c,
int i;
void foo() {
i = 4;
i += 2;
i -= 1;
}
generates the following assembly:
_foo proc near
mov dword ptr ds:_i, 5
retn
_foo endp
Note that the compiler knew what the result would be, so it just went ahead and optimized it.
Now, adding volatile to int i generates the following:
public _foo
_foo proc near
mov dword ptr ds:_i, 4
mov eax, dword ptr ds:_i
add eax, 2
mov dword ptr ds:_i, eax
mov eax, dword ptr ds:_i
sub eax, 1
mov dword ptr ds:_i, eax
retn
_foo endp
The compiler never trusts the value of i, and always re-loads it from memory.
Related
I'm interested in systems programming, and want to see how structs are implemented in assembly, and how they are linked.
I've written three short .c codes, with same named structs but in different files and compiled and linked together, but I can't understand the output.
I believed that the struct is just a contiguous block of memory in assembly, in the data segment. But, I can't access the values after the first int data, either by using pointers or by having each function use its corresponding files' offsets. Can someone explain the output?
I tried to change data types, implement chars to avoid padding or endian problems. I tried using a char* pointer to print the entire structure, which surprisingly gives me weird values (not random, since they are the same in every execution)
#include <stdio.h>
struct S1
{
int i1;
int i2;
int i3;
int i4;
int i5;
int i6;
};
int main()
{
struct S1 s;
s.i1 = 5;
s.i2 = 10;
s.i3 = 15;
s.i4 = 16;
func1(s); //Implicit calls, no declarations needed
// because linker will know where to find defs
func2(s);
}
#include <stdio.h>
struct S1
{
int i1;
int i2;
float i3;
};
void func1(struct S1 s)
{
printf("In func1 : %d %d %f %lu\n", s.i1, s.i2, s.i3, sizeof(s));
};
#include <stdio.h>
struct S3
{
double i1;
int i2;
};
void func2(struct S3 s)
{
printf("In func2 : %lf %d %lu \n", s.i1, s.i2, sizeof(s));
};
I'm getting outputs:
In func1 : 1 0 0.000000 12
In func2 : 0.000000 1 16
I expected the values to be printed
As GCC 9.2 compiles your file with main, we see these instructions for the call to func1:
mov rdx, QWORD PTR [rbp-16]
mov rax, QWORD PTR [rbp-8]
mov rdi, rdx
mov rsi, rax
mov eax, 0
call func1
Note that the compiler has loaded data into general registers—rsi, rdi, and so on. Compare this to the instructions in func1:
mov rdx, rdi
movd eax, xmm0
mov QWORD PTR [rbp-16], rdx
mov DWORD PTR [rbp-8], eax
movss xmm0, DWORD PTR [rbp-8]
In particular, note the movss instruction. That is attempting to retrieve the float i3 member from the xmm0 register. But, as we have seen, the calling routine did not put anything in the xmm0 register.
The compiler has a specification for how arguments are passed between routines. This is called an Application Binary Interface. (The ABI is shared by software intended to be compatible on a particular platform and is often recommended by the processor manufacturer.) For small structures, at least in this case, the compiler does not pass them by pointing to them in memory or reproducing their exact layout. Instead, the members are passed individually, as if they were separate arguments.
Because of this, your code is not studying how structures are laid out in memory. It is studying how structures are passed in function calls. And part of the answer is that members are sometimes passed individually, and where they are passed depends in part on their types. Because main is passing only integer data, it uses registers for integer arguments. Because func1 is expecting some floating-point data, it looks in a register for that. The result is that func1 never gets the data that is passed for i3 and i4.
While Eric's answer is good, I'd like to highlight one additional misunderstanding your question contains:
func1(s); //Implicit calls, no declarations needed
The comment is incorrect. C always requires declarations of functions. It does allow you to declare it without a prototype, as:
void func1();
However, to call it, the promoted type of the arguments must match the actual type in the function's definition (even if that definition is in another translation unit). If it does not match, the behavior is undefined.
Here is the explanation of what it means 6.7.6.3/7:
If the keyword static also appears within the [ and ] of the array
type derivation, then for each call to the function, the value of the
corresponding actual argument shall provide access to the first
element of an array with at least as many elements as specified by the
size expression.
It is not quite clear what it means. I ran the following example:
main.c
#include "func.h"
int main(void){
char test[4] = "123";
printf("%c\n", test_func(2, test));
}
And 2 different implementations of the test_func:
static version
func.h
char test_func(size_t idx, const char[const static 4]);
func.c
char test_func(size_t idx, const char arr[const static 4]){
return arr[idx];
}
non-static version
func.h
char test_func(size_t idx, const char[const 4]);
func.c
char test_func(size_t idx, const char arr[const 4]){
return arr[idx];
}
I checked the assembly code compiled with gcc 7.4.0 -O3 of the function in both of the cases and it turned out to be completely identical:
Disassembly of the functions
(gdb) disas main
sub rsp,0x18
mov edi,0x2
lea rsi,[rsp+0x4]
mov DWORD PTR [rsp+0x4],0x333231
mov rax,QWORD PTR fs:0x28
mov QWORD PTR [rsp+0x8],rax
xor eax,eax
call 0x740 <test_func>
[...]
(gdb) disas test_func
movzx eax,BYTE PTR [rsi+rdi*1]
ret
Can you give an example where the static keyword gives some benefits (or any differences at all) comparing to non-static counterpart?
Here is an example where static actually makes a difference:
unsigned foo(unsigned a[2])
{
return a[0] ? a[0] * a[1] : 0;
}
clang (for x86-64, with -O3) compiles this to
foo:
mov eax, dword ptr [rdi]
test eax, eax
je .LBB0_1
imul eax, dword ptr [rdi + 4]
ret
.LBB0_1:
xor eax, eax
ret
But after replacing the function parameter with unsigned a[static 2], the result is simply
foo:
mov eax, dword ptr [rdi + 4]
imul eax, dword ptr [rdi]
ret
The conditional branch is not necessary because a[0] * a[1] evaluates to the correct result whether a[0] is zero or not. But without the static keyword, the compiler cannot assume that a[1] can be accessed, and thus has to check a[0].
Currently only clang does this optimization; ICC and gcc produce the same code in both cases.
This isn't used much by compilers in my experience, but one use is that the compiler can assume that the (array decayed into pointer) parameter is not NULL.
Given this function, both gcc and clang (x86) produce identical machine code at -O3:
int func (int a[2])
{
if(a)
return 1;
return 0;
}
Disassembly:
func:
xor eax, eax
test rdi, rdi
setne al
ret
When changing the parameter to int a[static 2], gcc gives the same output as before, but clang does a better job:
func:
mov eax, 1
ret
Since clang realizes that a can never be NULL, so it can skip the check.
this is my first question, because I couldn't find anything related to this topic.
Recently, while making a class for my C game engine project I've found something interesting:
struct Stack *S1 = new(Stack);
struct Stack *S2 = new(Stack);
S1->bPush(S1, 1, 2); //at this point
bPush is a function pointer in the structure.
So I wondered, what does operator -> in that case, and I've discovered:
mov r8b,2 ; a char, written to a low point of register r8
mov dl,1 ; also a char, but to d this time
mov rcx,qword ptr [S1] ; this is the 1st parameter of function
mov rax,qword ptr [S1] ; !Why cannot I use this one?
call qword ptr [rax+1A0h] ; pointer call
so I assume -> writes an object pointer to rcx, and I'd like to use it in functions (methods they shall be). So the question is, how can I do something alike
push rcx
// do other call vars
pop rcx
mov qword ptr [this], rcx
before it starts writing other variables of the function. Something with preprocessor?
It looks like you'd have an easier time (and get asm that's the same or more efficient) if you wrote in C++ so you could use language built-in support for virtual functions, and for running constructors on initialization. Not to mention not having to manually run destructors. You wouldn't need your struct Class hack.
I'd like to implicitly pass *this pointer, because as shown in second asm part it does the same thing twice, yes, it is what I'm looking for, bPush is a part of a struct and it cannot be called from outside, but I have to pass the pointer S1, which it already has.
You get inefficient asm because you disabled optimization.
MSVC -O2 or -Ox doesn't reload the static pointer twice. It does waste a mov instruction copying between registers, but if you want better asm use a better compiler (like gcc or clang).
The oldest MSVC on the Godbolt compiler explorer is CL19.0 from MSVC 2015, which compiles this source
struct Stack {
int stuff[4];
void (*bPush)(struct Stack*, unsigned char value, unsigned char length);
};
struct Stack *const S1 = new(Stack);
int foo(){
S1->bPush(S1, 1, 2);
//S1->bPush(S1, 1, 2);
return 0; // prevent tailcall optimization
}
into this asm (Godbolt)
# MSVC 2015 -O2
int foo(void) PROC ; foo, COMDAT
$LN4:
sub rsp, 40 ; 00000028H
mov rax, QWORD PTR Stack * __ptr64 __ptr64 S1
mov r8b, 2
mov dl, 1
mov rcx, rax ;; copy RAX to the arg-passing register
call QWORD PTR [rax+16]
xor eax, eax
add rsp, 40 ; 00000028H
ret 0
int foo(void) ENDP ; foo
(I compiled in C++ mode so I could write S1 = new(Stack) without having to copy your github code, and write it at global scope with a non-constant initializer.)
Clang7.0 -O3 loads into RCX straight away:
# clang -O3
foo():
sub rsp, 40
mov rcx, qword ptr [rip + S1]
mov dl, 1
mov r8b, 2
call qword ptr [rcx + 16] # uses the arg-passing register
xor eax, eax
add rsp, 40
ret
Strangely, clang only decides to use low-byte registers when targeting the Windows ABI with __attribute__((ms_abi)). It uses mov esi, 1 to avoid false dependencies when targeting its default Linux calling convention, not mov sil, 1.
Or if you are using optimization, then it's because even older MSVC is even worse. In that case you probably can't do anything in the C source to fix it, although you might try using a struct Stack *p = S1 local variable to hand-hold the compiler into loading it into a register once and reusing it from there.)
I am trying to get a sense of how I should use const in C code. First I didn't really bother using it, but then I saw a quite a few examples of const being used throughout. Should I make an effort and go back and religiously make suitable variables const? Or will I just be waisting my time?
I suppose it makes it easier to read which variables that are expected to change, especially in function calls, both for humans and the compiler. Am I missing any other important points?
const is typed, #define macros are not.
const is scoped by C block, #define applies to a file (or more strictly, a compilation unit).
const is most useful with parameter passing. If you see const used on a prototype with pointers, you know it is safe to pass your array or struct because the function will not alter it. No const and it can.
Look at the definition for such as strcpy() and you will see what I mean. Apply "const-ness" to function prototypes at the outset. Retro-fitting const is not so much difficult as "a lot of work" (but OK if you get paid by the hour).
Also consider:
const char *s = "Hello World";
char *s = "Hello World";
which is correct, and why?
How do I best use the const keyword in C?
Use const when you want to make it "read-only". It's that simple :)
Using const is not only a good practice but improves the readability and comprehensibility of the code as well as helps prevent some common errors. Definitely do use const where appropriate.
Apart from producing a compiler error when attempting to modify the constant and passing the constant as a non-const parameter, therefore acting as a compiler guard, it also enables the compiler to perform certain optimisations knowing that the value will not change and therefore it can cache the value and not have to read it fresh from memory, because it won't have changed, and it allows it to be immediately substituted in the code.
C const
const and register are basically the opposite of volatile and using volatile will override the const optimisations at file and block scope and the register optimisations at block-scope. const register and register will produce identical outputs because const does nothing on C at block-scope on gcc C -O0, and is redundant on -O1 and onwards, so only the register optimisations apply at -O0, and are redundant from -O1 onwards.
#include<stdio.h>
int main() {
const int i = 1;
printf("%d", i);
}
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 1
mov eax, DWORD PTR [rbp-4] //load from stack isn't eliminated for block-scope consts on gcc C unlike on gcc C++ and clang C, even though value will be the same
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
In this instance, with -O0, const, volatile and auto all produce the same code, with only register differing c.f.
#include<stdio.h>
const int i = 1;
int main() {
printf("%d", i);
}
i:
.long 1
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov eax, DWORD PTR i[rip] //load from memory
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
with const int i = 1; instead:
i:
.long 1
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov eax, 1 //saves load from memory, now immediate
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
C++ const
#include <iostream>
int main() {
int i = 1;
std::cout << i;
}
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 1 //stores on stack
mov eax, DWORD PTR [rbp-4] //loads the value stored on the stack
mov esi, eax
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
leave
ret
#include <iostream>
int main() {
const int i = 1;
std::cout << i;
}
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 1 //stores it on the stack
mov esi, 1 //but saves a load from memory here, unlike on C
//'register' would skip this store on the stack altogether
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
leave
ret
#include <iostream>
int i = 1;
int main() {
std::cout << i;
}
i:
.long 1
main:
push rbp
mov rbp, rsp
mov eax, DWORD PTR i[rip] //load from memory
mov esi, eax
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
pop rbp
ret
#include <iostream>
const int i = 1;
int main() {
std::cout << i;
}
main:
push rbp
mov rbp, rsp
mov esi, 1 //eliminated load from memory, now immediate
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
pop rbp
ret
C++ has the extra restriction of producing a compiler error if a const is not initialised (both at file-scope and block-scope). const also has internal linkage as a default on C++. volatile still overrides const and register but const register combines both optimisations on C++.
Even though all the above code is compiled using the default implicit -O0, when compiled with -Ofast, const surprisingly still isn't redundant on C or C++ on clang or gcc for file-scoped consts. The load from memory isn't optimised out unless const is used, even if the file-scope variable isn't modified in the code. https://godbolt.org/z/PhDdxk.
If I compile:
int *a;
void main(void)
{
*a = 1;
}
and then disassemble main in cdb I get:
pointersproject!main:
00000001`3fd51010 mov rax,qword ptr [pointersproject!a (00000001`3fd632f0)]
00000001`3fd51017 mov dword ptr [rax],1
00000001`3fd5101d xor eax,eax
00000001`3fd5101f ret
So *a is symbolized by pointersproject!a. All good.
However, if I declare the pointer within main:
void main(void)
{
int *a;
a = 1;
}
I see that a is just an offset from the stack pointer (I believe), rather then the human-readable structure I'd expect (like, say pointersproject!main!a):
pointersproject!main:
00000001`3fd51010 sub rsp,18h
00000001`3fd51014 mov rax,qword ptr [rsp]
00000001`3fd51018 mov dword ptr [rax],1
00000001`3fd5101e xor eax,eax
00000001`3fd51020 add rsp,18h
00000001`3fd51024 ret
This is probably as much about my understanding of what the compiler's done as anything else but: can anyone explain why the notation for a isn't what I expect?
(This inspired by musing while looking at x64 Windows Debugging: Practical Foundations by Dmitry Vostokov).
When a variable is defined inside a function, it is an automatic variable unless explicitly declared static. Such variables only live during the execution of the function and are normally allocated in the stack, thus they are deallocated when the function exits. The change you see in the complied code is not due to the change in scope but to the change from static to automatic variable. If you make a static, it will not be allocated in the stack, even if its scope is the function main.