Memory allocation after declaration of extern class variable - c

I have read in multiple places that when declaring an extern variable, the memory is not designated until the definition is made. I was trying this code which is giving contradictory output in gcc.
#include <stdio.h>
int main() {
extern int a;
printf("%lu", sizeof(a));
return 0;
}
it should have shown error or zero size. but the output was following. please justify the output. Is it example of another undefined behavior?
aditya#theMonster:~$ ./a
4

You're able to get away with it here because a is never actually used. The expression sizeof(a) is evaluated at compile time. So because a is never referenced, the linker doesn't bother looking for it.
Had you done this instead:
printf("%d\n", a);
Then the program would have failed to link, printing "undefined reference to `a'"

The size of a variable is the size of its data type, whether it is presently only an extern or not. Since sizeof is evaluated at compile time, whereas symbol resolution is done at link time, this is acceptable.
Even with -O0, gcc doesn't care that it's extern; it puts 4 in esi for the argument to printf: https://godbolt.org/z/Zv2VYd
Without declaring a, however, any of the following will fail:
a = 3;
printf("%d\n", a);
int *p = &a;

The a is an integer, so its size is 4.
Its location(address) and value are not currently known.(it is extern somewhere at some other location)
But the size is well defined.

size_t sizeof(expr/var_name/data_type) 1 is a unary operator which when not provided with a variable length array, do not evaluate the expression. It just check the data type of expression.
Similarly, here, in sizeof(a), the complier only checks the data type of a which is int and hence returns the size of int.
Another example to clear your confusion is in sizeof(i++), i do not get incremented. Its data type is checked and returned.
One more example:
void main(){
int p=0;
printf("%zu",sizeof(p=2+3.0));
printf("%d",p);
}
will give u output on gcc as:
4
0

There is indeed a problem in your code, but not where you expect it:
passing a value of type size_t for printf conversion specification %ld has undefined behavior if size_t and unsigned long have different sizes or representations, as is the case on many systems (16-bit systems, Windows 64-bit...).
Here is a corrected version, portable to non C99-conforming systems, whose C library printf might not support %zu:
#include <stdio.h>
int main(void) {
extern int a;
printf("%lu\n", (unsigned long)sizeof(a));
return 0;
}
Regarding why the program compiles and executes without an error:
Variable a is declared inside the body of main with extern linkage: no space is allocated for it and a would be undefined outside the body of main.
sizeof(a) is evaluated at compile time as the constant value sizeof(int), which happens to be 4 on your platform.
No further reference to a is generated by the compiler, so the linker does not complain about a not being defined anywhere.

Related

The placement of static global and global variables with the same identifier

I'm learning some basics about linking and encountered the following code.
file: f1.c
#include <stdio.h>
static int foo;
int main() {
int *bar();
printf("%ld\n", bar() - &foo);
return 0;
}
file: f2.c
int foo = 0;
int *bar() {
return &foo;
}
Then a problem asks me whether this statement is correct: No matter how the program is compiled, linked or run, the output of it must be a constant (with respect to multiple runs) and it is non-zero.
I think this is correct. Although there are two definitions of foo, one of them is declared with static, so it shadows the global foo, thus the linker will not pick only one foo. Since the relative position of variables should be fixed when run (although the absolute addresses can vary), the output must be a constant.
I experimented with the code and on gcc 7.5.0 with gcc f1.c f2.c -o test && ./test it would always output 1 (but if I remove the static, it would output 0). But the answer says that the statement above is wrong. I wonder why. Are there any mistakes in my understanding?
A result of objdump follows. Both foos go to .bss.
Context. This is a problem related to the linking chapter of Computer Systems: A Programmer's Perspective by Randal E. Bryant and David R. O'Hallaron. But it does not come from the book.
Update. OK now I've found out the reason. If we swap the order and compile as gcc f2.c f1.c -o test && ./test, it will output -1. Quite a boring problem...
Indeed the static variable foo in the f1.c module is a different object from the global foo in the f2.c module referred to by the bar() function. Hence the output should be non zero.
Note however that subtracting 2 pointers that do not point to the same array or one past the end of the same array is meaningless, hence the difference might be 0 even for different objects. This may happen even as &foo == bar() would be non 0 because the objects are different. This behavior was common place in 16-bit segmented systems using the large model where subtracting pointers only affected the offset portion of the pointers whereas comparing them for equality compared both the segment and the offset parts. Modern systems have a more regular architecture where everything is in the same address space. Just be aware that not every system is a linux PC.
Furthermore, the printf conversion format %ld expects a value of type long whereas you pass a value of type ptrdiff_t which may be a different type (namely 64-bit long long on Windows 64-bit targets for example, which is different from 32-bit long there). Either use the correct format %td or cast the argument as (long)(bar() - &foo).
Finally, nothing in the C language guarantees that the difference between the addresses of global objects be constant across different runs of the same program. Many modern systems perform address space randomisation to lessen the risk of successful attacks, leading to different addresses for stack objects and/or static data in successive runs of the same executable.
Abstracting from the wring printf formats and pointer arithmetic problems static global variable from one compilation unit will be different than static and non-static variables having that same name in other compilation units.
to correctly see the difference in chars you should cast both to char pointer and use %td format which will print ptrdiff_t type. If your platform does not support it, cast the result to long long int
int main() {
int *bar();
printf("%td\n", (char *)bar() - (char *)&foo);
return 0;
}
or
printf("%lld\n", (long long)((char *)bar() - (char *)&foo));
If you want to store this difference in the variable use ptrdiff_t type:
ptrdiff_t diff = (char *)bar() - (char *)&foo;

Variable declaration not properly checked by compiler

When declaring a variable or pointer, the compiler assumes the variable or pointer itself is already declared when being assigned as a value during declaration.
I have tried both gcc and clang and they compile the "faulty" code without complaining.
CASE 1: This will not compile since "a" is not declared:
void main()
{
int b=sizeof(a);
}
CASE 2: This compiles without a problem:
void main()
{
int a=sizeof(a);
}
Shouldn't the compiler generate the "a is undeclared" error instead, just like in case 1?
Shouldn't the compiler generate the "a is undeclared" error instead, just like in case 1?
Why? It just saw you declare a.
int a = sizeof(a);
// ^--- here it is, before its first use
The declaration of a variable begins after its declarator is seen, right before its (optional) initializer. So you can even write the truly faulty
int a = a;
Note however that in your case there is nothing faulty being done. The result of sizeof depends only on the type of a, and the type is known. This is a well defined initializtion (with a conversion from size_t to int, but not one to be worried about).
sizeof is not a function depending on the value of a; it is a builtin that is evaluated at compile time, so it becomes equivalent to
int a = 4;

Why applying sizeof operator to an extern variable does not output 0

The output of the following code is 4.
Shouldn't it be 0?
Since a is declared and not been defined and hence memory is not allocated for it.
#include <stdio.h>
#include <stdlib.h>
int main()
{
extern int a;
printf("%ld",sizeof(a));
return 0;
}
We know what the size of a is even if it is not defined in this module. sizeof does not tell you how much memory has been allocated for an object in this module. It tells you how much memory the object requires.
Two points:
sizeof is evaluated at compile time, not run time1. It doesn't depend on when a has actually been allocated. This is because...
sizeof operates on types, not objects. When the operand is an object expression like a, the type of that expression is used. Again, this is all done at compile time, when the type of an expression is known to the compiler.
Except for variably modified types like VLAs.
Just because the variable is not defined in this translation unit doesn't mean the compiler doesn't know its size. If it didn't, it wouldn't be able to read or write from it.
The sizeof operator is valid on any variable or complete type, and int is a complete type. If on the other hand you had a forward declaration of a struct:
struct mystruct;
You couldn't calculate sizeof(struct mystruct) because the type is incomplete and thus the size can't be known.
sizeof returns the amount of memory allocated to that data type.
For int it is 4 bytes and not 0 byte, hence the output
This link will help you further with the details:
https://www.geeksforgeeks.org/sizeof-operator-c/
The behavior of your program is actually undefined:
the format %ld may be inappropriate for type size_t, the type of the sizeof(a) expression. If so, the behavior is undefined. The proper format is %zu, but this C99 format is not always correctly supported on common targets, such as all but the most recent versions of Visual Studio.
For best portability, you should write:
#include <stdio.h>
int main(void) {
extern int a;
printf("%d\n", (int)sizeof(a));
return 0;
}
Modified this way, the program will still output the size of the int type (4 on your platform) and not complain about a not being defined. The reasons are:
sizeof(a) is evaluated at compile time, from the declared type in extern int a;. a need not be defined at all. Its type is declared as int and sizeof(a) (or simply sizeof a) is completely equivalent to sizeof(int).
the code does not make a reference to the object a. The compiler will produce the executable even if a is not defined in one of the modules compiled and linked together.

Passing float to a function with int argument (that is not declared beforehand)

I have read Garbage value when passed float values to the function accepting integer parameters answers. My question goes a bit deeper. I could have also asked there had I more than 50 reputation point. I am adding my code for more clarification:
#include <stdio.h>
#include <string.h>
void p2(unsigned int tmp)
{
printf("From p2: \n");
printf("tmp = %d ,In hex tmp = %x\n", tmp, tmp);
}
int main()
{
float fvar = 45.65;
p1(fvar);
p2(fvar);
printf("From main:\n");
printf("sizeof(int) = %lu, sizeof(float) = %lu\n", sizeof(int),
sizeof(float));
unsigned int ui;
memcpy(&ui, &fvar, sizeof(fvar));
printf("fvar = %x\n", ui);
return 0;
}
void p1(unsigned int tmp)
{
printf("From p1: \n");
printf("tmp = %d ,In hex tmp = %x\n", tmp, tmp);
}
The output is:
From p1:
tmp = 1 ,In hex tmp = 1
From p2:
tmp = 45 ,In hex tmp = 2d
From main:
sizeof(int) = 4, sizeof(float) = 4
fvar = 4236999a8
Passing a float value to a function that is declared beforehand (i.e. p2) with int arguments gives the correct result. When trying the same with a function that is not declared beforehand (i.e. p1) gives incorrect values. And I know the reason that compiler won't assume any type or arity of arguments for the function not declared before handed. That's why float value does not get typecasted to int in the case of p2.
My confusion is, in the case of p2, how exactly does float value get copied to local int variable tmp.
If it is 'bit by bit copy' than reading those locations should yield something (except 1) in hex at least (if not in integer). But that does not sound the case as output shows. I know that float representation is different.
And how p2 may read registers/stack locations that floats weren't copied to? as simonc suggested in the linked question?
I have included the size of int and float both and my compiler is gcc if that helps.
The C programming language is essentially a single-scan language - a compiler doesn't need to reread the code but it can assemble it line by line, retaining information only on how identifiers were declared.
The C89 standard had the concept of implicit declaration. In absence of a declaration, the function p1 is declared implicitly as int p1(); i.e. a function that returns an int and takes unspecified arguments that go through default argument promotions. When you call such a function giving it a float as an argument, the float argument is promoted to a double, as called for by default argument promotions. It would be fine if the function was int p1(double arg); but the expected argument type is unsigned int, and the return value is not compatible either (void vs int). This mismatch will cause the program to have undefined behaviour - there is no point in reasoning what is happening then. However, there are many old C programs that would fail to compile, if the compilers wouldn't support the archaic implicit declarations - thus you just need to consider all these warnings as errors.
Notice that if you change the return value of p1 into an int, you will get less warnings:
% gcc implicit.c
implicit.c:14:5: warning: implicit declaration of function ‘p1’ [-Wimplicit-function-declaration]
p1(fvar);
^~
But the observed behaviour on my compiler would be mostly the same.
Thus the presence of mere warning: implicit declaration of function ‘x’ is quite likely a serious error in newly written code.
Were the function declared before its use, as is case with p2, then the compiler knows that it expects an unsigned long as the argument, and returns void, and therefore it would know to generate correct conversion code from float to unsigned long for the argument.
The C99 and C11 do not allow implicit function declarations in strictly-conforming programs - but they also do not require a conforming compiler to reject them either. C11 says:
An identifier is a primary expression, provided it has been declared as designating an object (in which case it is an lvalue) or a function (in which case it is a function designator).
and a footnote noting that
Thus, an undeclared identifier is a violation of the syntax.
However, it doesn't require a compiler to reject them.
This,
void p1(unsigned int tmp);
would be implicitly declared as
int p1();
by the compiler.
Although the compiler does not throw an error, it should be considered one as you can read in the linked post.
In any event, this is undefined behavior and you can't expect a predictable output.
In binary level, float and int don't look alike at all.
When trying to copy a float into a int, there's an implicit conversion, that's why when you call a function that takes int as argument but you provide a float you get the integer part of it, but in the final test you get to see how ugly it really look like. That's no garbage, that's how a float looks like in memory if you'd print it in hexadecimal. See IEEE 754 for details.
The issue with p1() however is that you are trying to call a function that has not been declared, so it's automatically declared as int p1(). Even though you later define it as void p1(unsigned int tmp), it has already been declared as int p1() (not taking any parameters) so it doesn't work (behavior is undefined). I'm pretty sure the compiler is screaming with warnings and errors about that, those errors aren't meant to be ignored.
Notice there's a big difference between declaring and defining a function. It is perfectly legal to define a function later, the way you are doing, but if you want it to work properly it has to be declared before any attempt to use it.
Example:
// declare functions
void p1(unsigned int tmp);
void p2(unsigned int tmp);
// use functions
int main()
{
p1(1);
p2(1);
}
// define functions
void p1(unsigned int tmp)
{
// do stuff
}
void p2(unsigned int tmp)
{
// do stuff
}

Segmentation fault while accessing the return address from a C function in 64 bit machine

I have code in C (linux(x86_64)) some like this:
typedef struct
{
char k[32];
int v;
}ABC;
ABC states[6] = {0};
ABC* get_abc()
{
return &states[5];
}
while in main():
int main()
{
ABC *p = get_abc();
.
.
.
printf("%d\n", p->v);
}
I am getting segmentation fault at printf statement while accessing p->v. I tried to debug it from gdb and it says "can not access the memory". One important thing here is that when I compile this code, gcc throws me a warning on ABC *p = get_abc(); that I am trying to convert pointer from integer. My question here is that I am returning address of structure from get_abc() then why compiler gives me such warning? why compiler considers it as integer? I think I am getting segmentation fault due to this warning as an integer can not hold memory address in x86_64.
Any help would be appreciated.
Define the get_abc prototype before main function. If function prototype is not available before that function call means, compiler will treat that function by default as passing int arguments and returning int. Here get_abc actually returning 8 byte address, but that value has been suppressed to 4 bytes and it is stored in ABC *p variable which leads the crash.
ABC* get_abc();
int main()
{
ABC *p = get_abc();
}
Note : This crash will not occur in 32 bit machine where size of int and size of address is 4 bytes, because suppression will not happen. But that warning will be there.
You haven't shown us all your code, but I can guess with some confidence that the get_abc() and main() functions are defined in separate source files, and that there's no visible declaration of get_abc() visible from the call in main().
You should create a header file that contains a declaration of get_abc():
ABC *get_abc();
and #include that header both in the file that defines get_abc() and in the one that defines main(). (You'll also need header guards.) You'll need to move the definition of the type ABC to that header.
Or, as a quick-and-dirty workaround, you can add an explicit declaration before your definition of main() -- but that's a rather brittle solution, since it depends on you to get the declaration exactly right.
In the absence of a visible declaration, and undeclared function is assumed to return int. The compiler sees your call to get_abc(), generates code to call it as if it returned an int, and implicitly converts that int value to a pointer. Hilarity ensues.
Note: There actually is no implicit conversion from int to pointer types, apart from the special case of a null pointer constant, but many compilers implement such an implicit conversion for historical reasons. Also, the "implicit int" rule was dropped in the 1999 version of the standard, but again, many compilers still implement it for historical reasons. Your compiler should have options to enable better warnings. If you're using gcc, try gcc -pedantic -std=c99 -Wall -Wextra.

Resources