Extern and global variable - c

I am unable to understand the weird behaviour of the program. It throws a warning but the output is surprising. I have 3 files, file1.c, file2c, file1.h
file1.c :
#include "file1.h"
extern void func(void);
int main(void)
{
func();
printf("%d\n",var);
return 0;
}
file2.c:
int var;
void func(void)
{
var = 1;
}
and file1.h is :
#include<stdio.h>
extern double var;
When I compile and run, it shows a warning as expected but on printing prints 1. But how is it possible when I am changing the var declared in file2.c

You seem to have undefined behavior -- but one that is probably innocuous on many machines much of the time.
Since your extern declaration says var is a double, when it's actually an int, attempting to access it as a double causes undefined behavior.
In reality, however, in a typical case, a double will be eight bytes, where an int is four bytes. This means when you pass var to printf, 8 bytes of data will be pushed onto the stack (but starting from the four bytes that var actually occupies).
printf will then attempt to convert the first four bytes of that (which does correspond to the bytes of var) as an int, which does match the actual type of var. This is why the behavior will frequently be innocuous.
At the same time, if your compiler was doing bounds checking, your attempt at loading 8 bytes of a four-byte variable could easily lead to some sort of "out of bounds" exception, so your program wouldn't work right. It's just your misfortune that most C compilers don't do any bounds-checking.
It's also possible that (probably on a big-endian machine) instead of the bytes corresponding to var happening to be in the right place on the stack for printf to access them as an int, that the other four bytes that would make up a double will end up in that place, and it would print garbage.
Yet another possibility would be that the bit pattern of the 8 bytes would happen to correspond to some floating point value that would cause an exception as soon as you tried to access it as a floating point number. Given the (small) percentage of bit patterns that correspond to things like signaling NaN's, chances of that arising are fairly small though (and even the right bit pattern might not trigger anything, if the compiler just pushed 8 bytes on the stack and left it at that).

Just setting the type on an extern definition doesn't change the value stored in memory - it only changes how the value is interpreted. So a value of 0x0001 is what is stored in memory no matter how try to interpret it.
Once you've called func() you've stored the value 0x0001 in the location where a is stored. Then in main when a is passed to printf the value stored at location a plus the next 4 bytes (assuming 32bits) is pushed onto the stack. That is the value 0x0001xxxx is pushed onto the stack. But the %d treats the value passed in as an int and will read the first 4 bytes on the stack, which is 0x0001 so it will print 1.
Try changing your code as follows:
file1.c :
#include "file1.h"
extern void func(void);
int main(void)
{
func();
printf("%d %d\n",var);
return 0;
}
file2.c:
int var;
int var2;
void func(void)
{
var = 1;
var2 = 2;
}
and file1.h is :
#include<stdio.h>
extern double var;
You will probably get the output 1 2 (although this depends on how the compiler stores the variables, but it's likely it will store them in adjacent locations)

Related

The placement of static global and global variables with the same identifier

I'm learning some basics about linking and encountered the following code.
file: f1.c
#include <stdio.h>
static int foo;
int main() {
int *bar();
printf("%ld\n", bar() - &foo);
return 0;
}
file: f2.c
int foo = 0;
int *bar() {
return &foo;
}
Then a problem asks me whether this statement is correct: No matter how the program is compiled, linked or run, the output of it must be a constant (with respect to multiple runs) and it is non-zero.
I think this is correct. Although there are two definitions of foo, one of them is declared with static, so it shadows the global foo, thus the linker will not pick only one foo. Since the relative position of variables should be fixed when run (although the absolute addresses can vary), the output must be a constant.
I experimented with the code and on gcc 7.5.0 with gcc f1.c f2.c -o test && ./test it would always output 1 (but if I remove the static, it would output 0). But the answer says that the statement above is wrong. I wonder why. Are there any mistakes in my understanding?
A result of objdump follows. Both foos go to .bss.
Context. This is a problem related to the linking chapter of Computer Systems: A Programmer's Perspective by Randal E. Bryant and David R. O'Hallaron. But it does not come from the book.
Update. OK now I've found out the reason. If we swap the order and compile as gcc f2.c f1.c -o test && ./test, it will output -1. Quite a boring problem...
Indeed the static variable foo in the f1.c module is a different object from the global foo in the f2.c module referred to by the bar() function. Hence the output should be non zero.
Note however that subtracting 2 pointers that do not point to the same array or one past the end of the same array is meaningless, hence the difference might be 0 even for different objects. This may happen even as &foo == bar() would be non 0 because the objects are different. This behavior was common place in 16-bit segmented systems using the large model where subtracting pointers only affected the offset portion of the pointers whereas comparing them for equality compared both the segment and the offset parts. Modern systems have a more regular architecture where everything is in the same address space. Just be aware that not every system is a linux PC.
Furthermore, the printf conversion format %ld expects a value of type long whereas you pass a value of type ptrdiff_t which may be a different type (namely 64-bit long long on Windows 64-bit targets for example, which is different from 32-bit long there). Either use the correct format %td or cast the argument as (long)(bar() - &foo).
Finally, nothing in the C language guarantees that the difference between the addresses of global objects be constant across different runs of the same program. Many modern systems perform address space randomisation to lessen the risk of successful attacks, leading to different addresses for stack objects and/or static data in successive runs of the same executable.
Abstracting from the wring printf formats and pointer arithmetic problems static global variable from one compilation unit will be different than static and non-static variables having that same name in other compilation units.
to correctly see the difference in chars you should cast both to char pointer and use %td format which will print ptrdiff_t type. If your platform does not support it, cast the result to long long int
int main() {
int *bar();
printf("%td\n", (char *)bar() - (char *)&foo);
return 0;
}
or
printf("%lld\n", (long long)((char *)bar() - (char *)&foo));
If you want to store this difference in the variable use ptrdiff_t type:
ptrdiff_t diff = (char *)bar() - (char *)&foo;

Memory allocation after declaration of extern class variable

I have read in multiple places that when declaring an extern variable, the memory is not designated until the definition is made. I was trying this code which is giving contradictory output in gcc.
#include <stdio.h>
int main() {
extern int a;
printf("%lu", sizeof(a));
return 0;
}
it should have shown error or zero size. but the output was following. please justify the output. Is it example of another undefined behavior?
aditya#theMonster:~$ ./a
4
You're able to get away with it here because a is never actually used. The expression sizeof(a) is evaluated at compile time. So because a is never referenced, the linker doesn't bother looking for it.
Had you done this instead:
printf("%d\n", a);
Then the program would have failed to link, printing "undefined reference to `a'"
The size of a variable is the size of its data type, whether it is presently only an extern or not. Since sizeof is evaluated at compile time, whereas symbol resolution is done at link time, this is acceptable.
Even with -O0, gcc doesn't care that it's extern; it puts 4 in esi for the argument to printf: https://godbolt.org/z/Zv2VYd
Without declaring a, however, any of the following will fail:
a = 3;
printf("%d\n", a);
int *p = &a;
The a is an integer, so its size is 4.
Its location(address) and value are not currently known.(it is extern somewhere at some other location)
But the size is well defined.
size_t sizeof(expr/var_name/data_type) 1 is a unary operator which when not provided with a variable length array, do not evaluate the expression. It just check the data type of expression.
Similarly, here, in sizeof(a), the complier only checks the data type of a which is int and hence returns the size of int.
Another example to clear your confusion is in sizeof(i++), i do not get incremented. Its data type is checked and returned.
One more example:
void main(){
int p=0;
printf("%zu",sizeof(p=2+3.0));
printf("%d",p);
}
will give u output on gcc as:
4
0
There is indeed a problem in your code, but not where you expect it:
passing a value of type size_t for printf conversion specification %ld has undefined behavior if size_t and unsigned long have different sizes or representations, as is the case on many systems (16-bit systems, Windows 64-bit...).
Here is a corrected version, portable to non C99-conforming systems, whose C library printf might not support %zu:
#include <stdio.h>
int main(void) {
extern int a;
printf("%lu\n", (unsigned long)sizeof(a));
return 0;
}
Regarding why the program compiles and executes without an error:
Variable a is declared inside the body of main with extern linkage: no space is allocated for it and a would be undefined outside the body of main.
sizeof(a) is evaluated at compile time as the constant value sizeof(int), which happens to be 4 on your platform.
No further reference to a is generated by the compiler, so the linker does not complain about a not being defined anywhere.

Call stack is reused between 2 function calls

Following simple C program outputs 42 to my surprise. I expected it to output a garbage value or 0, since stack frames are different for foo() and bar(). How the output is deterministic to be 42 ?
#include <stdio.h>
void foo(void){
int a;
printf("%d\n",a);
}
void bar(){
int a=42;
}
int main(){
bar();
foo();
return 0;
}
>>gcc -o test test.c
>>./test
42
When I instruct compiler to optimize the code, it prints garbage!
>>gcc -O -o test test.c
>>./test
2487239847
Yes, the value 42 is a garbage. Here is the explanation for it:
Every function in stack, starts similar to this order
Parameters of the function
Return value of the function
EBP( Which stores previous frame pointer)
Exception handler frame
Local variables
Buffer
callee save register
In the above example, main() is called and follows the procedure as above.
Then it encounters the bar() follows 1,2,3,4 steps mentioned and then stores the local variable a=42 in the memory(5) then 6,7 are followed then it will come out of the memory.
Then it encounters the foo() follows 1,2,3,4 steps as same in the memory location that bar() had. And you declared a local variable called a which will point to same memory location that bar() holding the local variable a=42. So it is giving the same value 42 when you are printing, which is actually a garbage value.
To validate this, try this example: This prints 7
#include <stdio.h>
#include <string.h>
void foo() {
int b;
printf("%d\n",b);
}
void zoo() {
int dummy = 7;
}
void bar(){
int a1=3;
}
int main(){
bar();
zoo();
foo();
return 0;
}
Ref: Doc
The function calls go on stack (their activation records), in this case the two functions foo and bar are identical in terms of parameters, return type and the local variables within each function. So the first call puts something on the stack and once done its activation record is popped out but not cleaned (compiler does not generate code to clean things up). Now the second function call essentially ends up using the same memory location and therefore gets the value from a previous call using same memory area. The values that are not initialized in a function are considered undefined. In the foo and bar case we got to the same value due to similar signature of two functions and same type and location of the variable. Try adding two integers or integer and a float in one function and reverse the order in next and you'll see the effects. Of course when you call the optimizer it might place things on register and as a result you may get undefined values when the compiler sees no scope for optimization as in case of foo.

understanding call a pointer to a function

I am trying to understand what is a pointer to a function in c.
I am wanting some detailed process of calling a pointer to function, thus, i could understand pointer to function better.
Could somebody explain why does my code below not crash and have some wired output?
To narrow down, I am seeking something like javap which could explain how does jdk compile
my code and jvm run my code
what is the relationship of a void return and number 14,15 or 16.
(the void function return)
is there any security problem to my second param or is it same as non-init val ?
test.c
#include <stdio.h>
#include <stdlib.h>
static void f(int x, int y){
printf("x = %d \n", x );
printf("y = %d \n", y );
}
typedef int (*FUNC)(int);
int main(void){
long int addr = (long int)f;
printf("%d \n", (int)((FUNC)addr)(1) );
return 0;
}
output on mac os compiled with i686-apple-darwin11-llvm-gcc-4.2
x = 1
y = 1479046720
16
The answer is undefined behavior. You're using two incompatible function pointer types and use one to call the other. (Not to mention storing the pointer in an integer, etc., etc.) Thus, your program invokes undefined behavior, and as such, anything can happen. And the values you get are most probably just random crap from the messed up stack and/or CPU registers.
You are causing undefined behavior all over the place:
You're storing the function pointer in a integer, which isn't guaranteed to work
You're casting said integer to a different type of function pointer (fewer parameters) with a different return type
You're calling the function with fewer parameters than it expects
You take a return value from a function returning void
Trying to make sense of this is just unreasonable, but as a guess since you're using x86:
x is populated correctly in the function with the 1 you passed
y isn't so it gets a random value, likely some leftover on the stack
there's no return value and you get whatever was left in the AX register
Could somebody explain why does my code below not crash and have some
wired output
Doing the wrong thing isn't guaranteed to crash your program - there are no guarantees.
The value 16 you get back in the last output is PROBABLY the number of characters written by printf - as that's what printf returns, and in this case, nothing else happens after that in the function. But like others have said, you're asking what happens, and nobody can really say - it may not work at all, crash and burn, or give you some "random" values as return and printout - it's not defined anywhere, and if you compile the code with a different compiler, different compiler settings or on a different type of hardware, the results will change.
Each time you run the code, it will most likely give you the same result, but make some changes to the code, and unpredictable results will occur.
To answer the question,
Could somebody explain why does my code below not crash
not all broken code actually crashes. Possibly the parameters (and the return values) of f are passed in registers instead of being pushed onto the stack, and therefore the mismatch between expected values and actual values does not translate into a stack misalignment. If you tried with more arguments, enough to require stack work, you would probably get that crash, or some possibly dangerous behaviour (a similar technique is used in some security exploits, after all).
Then to clarify usage of function pointers, I have taken the liberty of rewriting the code with a couple of comments.
#include <stdio.h>
#include <stdlib.h>
/* We want to be able to print a return value, so we make
f return something - a long, so as to tell it from int
- in order to be able to get something back.
*/
static long f(int x, int y){
printf("x = %d \n", x );
printf("y = %d \n", y );
return x + y;
}
/* We want the signature of a function returning long, and taking
two int's, since this is what f() is.
*/
typedef long (*FUNCPTR)(int, int);
typedef long (FUNC)(int, int);
int main(void)
{
/* We want a pointer to f - it works either way */
FUNCPTR addr = f;
FUNC *addr2 = f;
/* addr points to a function taking two ints, so we must pass it
two ints, and format return as long decimal */
printf("%ld\n", addr(5, 7));
/* addr2 is the same */
printf("%ld\n", addr2(5, 7));
return 0;
}
Expected output:
$ gcc -W -Wall -o test test.c
$ ./test
x = 5
y = 7
12
x = 5
y = 7
12

What are the implications of using static const instead of #define?

gcc complains about this:
#include <stdio.h>
static const int YY = 1024;
extern int main(int argc, char*argv[])
{
static char x[YY];
}
$ gcc -c test1.c
test1.c: In function main':
test1.c:5: error: storage size ofx' isn't constant
test1.c:5: error: size of variable `x' is too large
Remove the “static” from the definition of x and all is well.
I'm not exactly clear what's going on here: surely YY is constant?
I had always assumed that the "static const" approach was preferable to "#define". Is there any way of using "static const" in this situation?
In C, a const variable isn't a "real" compile-time constant... it's really just a normal variable that you're not allowed to modify. Because of this, you can't use a const int variable to specify the size of an array.
Now, gcc has an extension that allows you to specify the size of an array at runtime if the array is created on the stack. This is why, when you leave off the static from the definition of x, the code compiles. However, this would still not be legal in standard C.
The solution: Use a #define.
Edit: Note that this is a point in which C and C++ differ. In C++, a const int is a real compile-time constant and can be used to specify the size of arrays and the like.
You may use 'enum' or 'define' to declare the size:
#define XX 1024
static int const YY = 1024;
enum{ ZZ = 1024 };
extern int main(void){
static char x[XX]; // no error
*(int*)&XX = 123; // error: lvalue required as unary ‘&’ operand
static char y[YY]; // error: storage size of ‘y’ isn’t constant
*(int*)&YY = 123; // no error, the value of a const may change
static char z[ZZ]; // no error
*(int*)&ZZ = 123; // error: lvalue required as unary ‘&’ operand
}
Because you declared x as 'static' that makes it a global variable. Its just known only to the main() function in which it is declared. By declaring YY outside of any function, you have made it global. 'static' also makes it a global, but known only to this file.
If you declared YY as just 'const int YY = 1024', the compiler might treat it like a #define, but with a type. That depends on the compiler.
At this point 2 things might be wrong.
1:
All globals are initialized at runtime, before main() is called.
Since both x and YY are globals, they are both initialized then.
So, the runtime initialization of global x will have to allocate space according to the value in YY. If the compiler is not treating YY like #define with a type, it has to make a compile-time judgement about runtime values. It may be assuming the largest possible value for an int, which really would be too big. (Or possibly negative since you left it signed.)
It may be interesting to see what happens if you only change YY to a short, preferably an unsigned short. Then its max would be 64K.
2:
The size of global space may be limited on your system. You didn't specify the target platform and OS, but some have only so much.
Since you declared x as size YY, you have set it to take YY chars from global space. Every char in it would essentially be a global. If the global space on your system is limited, then 1024 chars of it may be too much.
If you declared x as a pointer to char, then it would take sizeof(char*) bytes. (4 bytes is the size of a pointer on most systems.) With this, you would need to set the pointer to the address of properly malloc'd space.
By declaring x without 'static', it becomes a local variable and is only initialized once the owning function is executed. And its space is taken from the stack, not global space. (This can still be a problem for systems or threads with very limited stack.) YY's value has long since been set by this point, so there is no problem.
Also:
I don't recall if there is any guarantee that globals are initialized in any order. If not, then x could be initialized before YY. If that happened then YY would just contain the random contents of RAM.
To follow on from Martin B's answer, you could do this:
#include <stdio.h>
#define XSIZE 1024
static const int YY = XSIZE;
int main(int argc, char*argv[])
{
static char x[XSIZE];
}
/* SHOULDN'T THIS WORK IN C++? */
// .h
class C {
public:
static const int kSIZE;
int a[kSIZE]; // error: array bound is not an integer constant
};
// .cc
const int C::kSIZE = 1;
/* WORKS FINE */
// .h
class C {
public:
enum eCONST { kSIZE = 1 };
int a[kSIZE];
};

Resources