Unexpected output after linking - c

I have two C files:
main.c
#include <stdio.h>
int sum(int n);
double array[2] = { 0.001, 1.0001 };
int main()
{
int val = sum(2);
printf("%d\n", val);
return 0;
}
sum.c
extern int array[2];
int sum(int n)
{
int i, ret = 0;
for (i = 0; i < n; i++) {
ret += array[i];
}
return ret;
}
I compile and link the files and then run the executable, but I am getting some unexpected output:
306318409
Why is that happening?

C Standard section 6.2.7/2 says
All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
So anything could have happened. That's what you get when you lie to the compiler.

This really helps to make out the difference between compiling and linking .
When these two files where compiled :
main.c
implicitly declares a sum() function with return type as int*.
Has an array variable of type double* in it's list of variables
sum.c :
externed variable array allowed by compiler to get a value at link time and added to it's list of symbols .
defines a function sum which now uses the variable array with type as int* whose information was maintained by the compiler independently for this program that was externed .
Now at link time :
sum() call in main.c gets resolved from sum.c
sum.c had maintained the type of array variable as int* , while the extern variable was resolved by the double* type ,which however does not change the behaviour of variable array decided in sum() definition at compile time .
An array of doubles is stored with each in 8 bytes , while the sum() function assumed it would be stored per 4 bytes . So array[0] read the first 4 bytes and array[1] read the next 4 bytes , while however the number 0.001 in main.c from it's array[0] was itself stored in 8 bytes , leading to this undefined behavior.

Related

Global variable initialized to 1 instead of 0 in my C program

If I have, suppose, initialized a global variable F, and print its value in main function, 0 will be printed (as it should be).
But when i pass the argument int F in the main function while declaring the global variable F in exact same way as before, printing the value of F using gcc prints 1.
Can anyone explain why is that?
This is my code below.
#include<stdio.h>
int F;
int main(int F){
printf("F is %d\n", F);
return 0;
}
When your main function contains an argument with the same name as a global variable then the stub is refered to the local variable .. not the global variable
#include <stdio.h>
static int F;
int main(){
printf("F is %d", F);
return 0;
}
But when i pass the argument int F in the main function  ? that F is nothing but argument count i.e argc.
In particular case you mentioned global F and F you declared in main() argument are different.
int main(int F){ printf("F is %d\n", F); return 0; }
Here printf() prints 1 because when you run your executable like ./a.out no of command line input is 1, it's similar to argc.
Your compiler could have warn you about argument provided to main(), compile with -Wall flag and read the warning. Also check the main() prototype. From the C standard
ISO/IEC 9899:1999
§5.1.2.2.1 Program startup
¶1 The function called at program startup is named main. The
implementation declares no prototype for this function. It shall be
defined with a return type of intand with no parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any
names may be used, as they are local to the function in which they are
declared):
int main(int argc, char argv[]) { / ... */ }
or equivalent;9) or in some other implementation-defined manner.
In your posted code, main has an argument named F. Inside main, any references to F refer that variable, not the global one.
You make life unnecessarily harder by using improper names of variables. Most of the time, the arguments to main are named argc and argv -- argument count and argument values.
int main(int argc, char** argv) { ... }
It's good to use variable names that have meaning. Using int F; as a global variable is not meaningful either. Name it such that it is meaningful. Then, you will run into problems like you did a lot less.
You should make main() conform to the requirements of the standard, and you should print the global F as well as the argument F:
#include <stdio.h>
int F;
int main(int F, char **G)
{
printf("F is %d\n", F);
{
extern int F;
printf("%s: F is %d\n", G[0], F);
}
return 0;
}
When compiled (from source file quirky43.c to program quirky43), and run, I get:
$ gcc -O3 -g -std=c11 -Wall -Wextra -Werror quirky43.c -o quirky43
$ ./quirky43
F is 1
quirky43: F is 0
$ ./quirky43 this is why you get command line arguments
F is 9
quirky43: F is 0
$
The first printf() is printing the first argument to main() (conventionally called argc, but there's nothing wrong with calling it F except that it is unexpected). The second one prints the global variable F (and also the program name, conventionally argv[0] but again there's nothing wrong with using G except that it is unexpected). The extern int F; inside a set of braces means that F in that statement block refers to a variable F defined outside the enclosing scope, which means the file scope variable F — which, it may be noted, is correctly initialized to 0. The 1 comes because you invoked the program without arguments, and the argument count includes the program name. It's also why the value printed was 9 when 8 arguments were added to the command line.
Note that another good compilation option to use is -Wshadow:
$ gcc -O3 -g -std=c11 -Wall -Wextra -Werror -Wshadow quirky43.c -o quirky43
quirky43.c: In function ‘main’:
quirky43.c:6:14: error: declaration of ‘F’ shadows a global declaration [-Werror=shadow]
int main(int F, char **G)
~~~~^
quirky43.c:4:5: note: shadowed declaration is here
int F;
^
cc1: all warnings being treated as errors
$
(Compilation with GCC 8.1.0 on a Mac running macOS High Sierra 10.13.5.)

Getting different output for the same program when compiled in Dev-c++ and gcc 4.8.2

#include<stdio.h>
#include<stdlib.h>
#define max 10
void init_graph(int arr[max][max],int v);
void create_graph(int arr[max][max],int v);
void print_graph(int arr[max][max],int v);
int main()
{
int v,n;
printf("Enter the number of vertices :");
scanf("%d",&v);
int arr[v][v];
init_graph(arr,v);
printf("v=%d after init\n",v);
create_graph(arr,v);
print_graph(arr,v);
return 0;
}
void create_graph(int arr[max][max],int v)
{
printf("v=%d \n",v);
int e;
printf("Enter the number of edges :");
scanf("%d",&e);
int i,src=0,dest=0;
for(i=0;i<e;i++)
{
printf("Enter Edge :%d",i+1);
scanf("%d %d",&src,&dest);
if(src<=-1 || src>=v)
{
printf("Invalid source vertex \n");
i--;
continue;
}
if(dest<=-1 || dest >=v)
{
printf("Invalid dest vertex \n");
i--;
continue;
}
//*(*(arr+src)+dest)=1;
//*(*(arr+dest)+src)=1;
arr[src][dest]=1;
arr[dest][src]=1;
}
}
void init_graph(int arr[max][max],int v)
{
int i,j;
for(i=0;i<v;i++)
{
for(j=0;j<v;j++)
{
//*(*(arr+i)+j)=0;
arr[i][j]=0;
}
}
printf("V=%d init_graph\n",v);
}
void print_graph(int arr[max][max],int v)
{
int i,j;
for(i=0;i<v;i++)
{
for(j=0;j<v;j++)
{
//printf("%d ",*(*(arr+i)+j));
printf("%d ",arr[i][j]);
}
printf("\n");
}
}
When I compiled the above program in Dev-c++ and in gcc 4.8.2 I'm getting different output. Here I'm trying to represent a graph using adjacency matrix representation.
when v is passed as parameter to the init_graph(arr,v)(in the above program), even though I'm not returning any value from the function, the value of v is becoming zero after after the function has been called.
It is working properly in Dev-c++ but I'm getting the wrong answer when compiled in gcc.4.8.2.
Screenshot of the output in Dev-c++
here v is not becoming 0
Screenshot of the output in gcc 4.8.2
and here v is becoming 0.
You are calling the function:
void init_graph(int arr[10][10], int v);
However your code is:
int arr[v][v];
init_graph(arr,v);
This causes undefined behaviour if v is not 10. The C11 standard clause is 6.5.2.2/6:
If [...] the types of the arguments after promotion are not compatible with the types of the parameters, the behavior is undefined.
Arrays of dimension X are only compatible with arrays of dimension Y if X == Y. (Bear in mind that the innermost dimension is "lost" due to the array as function parameter syntax quirk, so the innermost dimension can differ without breaking compatibility).
To fix this you should include the size in the array dimension in the function prototype:
void init_graph(int v, int arr[v][v]);
and similarly for the other functions.
Due to the way "array" function parameters are "adjusted" to pointers in C, the create_graph function really accepts a pointer to an array of length max. It is equivalent to this:
void create_graph(int (*arr)[max], int v)
That means that when you each iteration over arr[i] in the loop takes a step of size 10 ints, to the next length 10 array. But you are passing (after array decay) a pointer to an array of length v. If v is not the same as max, that is already undefined behaviour (UB). In your case, this takes you out of bounds, (that would cause UB by itself in an otherwise well defined program.)
You can only call the function with a pointer to array of length max or with an array of arrays whose inner array length is max (the latter will decay to pointer to array.)
Note that the type of platform dependent behaviour you saw is often a sign thet there is UB in the code.
In main() you define your array to be of size [v][v], but init_graph() takes an array of size [max][max]. You need to make these the same. I suggest changing main() since all your other functions also use max as the array size.
What happens is that when v is 5, your [5][5] array is laid out as 25 consecutive ints. But the function thinks the array is size [][10], with row size of 10 ints. So the moment you write to any element past [2][4] (the 25th element), you are writing past the end of the array and clobbering your stack. This is because your array in main() was defined as a local variable and is therefore located on the stack. In your case, the stack also contained the value of v, and it got overwritten with a 0. The reason the other compiler worked is probably because it located v before the array in memory instead of after it, so with that compiler it didn't happen to clobber v.
According to the other answers, calling the function with an incorrect argument invokes "undefined behavior" but I find that to be a lazy explanation ("anything can happen"). I hate it when people say that because any compiled program is a known quantity any you can determine exactly what the "undefined" behavior actually is (just step through it with a debugger). Once you learn what is going on and where your variables are located, you will start to intuitively understand when memory is getting corrupted and what code could possibly be responsible for it.

Local variable and static variables

I just want to understand the difference in RAM allocation.
Why if i define a variable before function i have a RAM overflow and when i define it inside a function it is ok?
For example:
/*RAM OK*/
void Record(int16_t* current, int i,int n)
{
float Arr[NLOG2] = {0};
for(i=0;i<n;i++)
Arr[i]=current[i*5];
}
/*RAM OVERFLOW*/
static float Arr[NLOG2] = {0};
void Record(int16_t* current, int i,int n)
{
for(i=0;i<n;i++)
Arr[i]=current[i*5];
}
This is the message:
unable to allocate space for sections/blocks with a total estimated
minimum size of 0x330b bytes (max align 0x8) in
<[0x200000c8-0x200031ff]> (total uncommitted space 0x2f38).
The difference is that in the first case, Arr is declared on the stack; until the function is called, that array doesn't exist. The generated binary contains code for creating the array, but the array itself isn't in the binary.
In the second case, however, Arr is declared outside of any function (aka at file scope). Therefore, it always exists, and is stored in the binary. Because you appear to be working on an embedded platform, this otherwise insignificant difference causes your "RAM overflow" error.
In the 2nd case, the array is allocated when the application starts. It remains in the memory until the app quits.
In the 1st case, the array is only allocated when function void Record(int16_t* current, int i,int n) is called. The array is gone after the function finishes its execution.
static keyword doesn't have any impact if you have only a single compilation unit (.o file).
Global variables (not static) are there when you create the .o file available to the linker for use in other files. Therefore, if you have two files like this, you get name collision on a:
a.c:
#include <stdio.h>
int a;
int compute(void);
int main()
{
a = 1;
printf("%d %d\n", a, compute());
return 0;
}
b.c:
int a;
int compute(void)
{
a = 0;
return a;
}
because the linker doesn't know which of the global as to use.
However, when you define static globals, you are telling the compiler to keep the variable only for that file and don't let the linker know about it. So if you add static (in the definition of a) to the two sample codes I wrote, you won't get name collisions simply because the linker doesn't even know there is an a in either of the files:
a.c:
#include <stdio.h>
static int a;
int compute(void);
int main()
{
a = 1;
printf("%d %d\n", a, compute());
return 0;
}
b.c:
static int a;
int compute(void)
{
a = 0;
return a;
}
This means that each file works with its own a without knowing about the other ones.

Can functions declared in main access variables declared in main?

I have an array that I declared and initialized in main called Edges.
I have also declared some functions in main that access the array called Edges.
The code compiles and works.
Why does it work? I thought variables declared in main aren't global.
Edit: see Sourav's code.
Actually, if you declare a function inside a function, the inner function is just visible to the outer function and NOT in global scope. So, the variables declared by you and the inner function [to be appropriate, the code block] is having same scope. Hence, no issues accessing the variable.
Check this one
code
#include <stdio.h>
#include <stdlib.h>
int innerfunc();
int main()
{
int outer = 5;
int innerfunc()
{
printf("outer is %d\n", outer);
}
innerfunc();
return 0;
}
output
[sourav#infba01383 so_overflow]# ./a.out
outer is 5
[sourav#infba01383 so_overflow]#
You can't declare a function inside a function in C. This means that you can't declare function(s) inside main. Compile your code with -pedantic flag and you will see this warning for sure;
[Warning] ISO C forbids nested functions [-Wpedantic]
I compiled this code
#include <stdio.h>
void void print(int *);
int main()
{
int a[2] = {1,3};
void print(int *a)
{
printf("%d", *a);
}
print(a);
return 0;
}
and getting the warning
[Warning] ISO C forbids nested functions [-Wpedantic]
First of all, as most answers have mentioned, it is a gcc extension; not part of standard C.
Below answer is strictly confined to gcc.
gcc does treat them as any other function.
e.g. Check below code:
(I took liberty to extend your code as below:)
#include <stdio.h>
#include <stdlib.h>
typedef int operation(int num1, int num2); // for function pointer...
operation* getOperation(char oper)
{
int a=10;
int add(int x, int y){return x+y+a;}
int sub(int x, int y){return x-y+a;}
int nop(int x, int y){return a;}
if(oper=='+')return add;
if(oper=='-')return sub;
return nop;
}
int main()
{
operation *my_op;
my_op=getOperation('+');
printf("%d\n",my_op(5,3));
my_op=getOperation('-');
printf("%d\n",my_op(5,3));
return 0;
}
If you compile it with gcc -S & check the assembly code generated, it would show that
The functions - getOperation & main - are converted to assembly, without any name change. Thus these can be called from any function (in this or even from other file).
e.g.
.globl getOperation /*This line will be missing in case of static functions.*/
.type getOperation, #function
The functions - add, sub, nop - are converted to assembly with some unique random suffix.
e.g.
/*No .globl line is printed here.*/
.type add.2685, #function
Since the names are changed, you cannot call them from other functions. Only the 'parent function' (getOperation in this case) has the information of the function name. (Check for c variable scope for more details.)
However, you can use them in other functions, using function pointers, as shown in code above.
Regarding the local variables in getOperation (a for example): They are accessed from add/sub/nop using rbp register.
HINT: Compile a small code having 'local functions' with gcc -S, to understand what's exactly going on.. :-)
Nested function (annonymouse functions) are not a part of the c standard library, there is an extension which can be used.
You may declare global variables and use them throughout your programs also.
Sourav is correct actually, you may declare the function but its scope is limited to main

How to determine the length of a function?

Consider the following code that takes the function f(), copies the function itself in its entirety to a buffer, modifies its code and runs the altered function. In practice, the original function that returns number 22 is cloned and modified to return number 42.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ENOUGH 1000
#define MAGICNUMBER 22
#define OTHERMAGICNUMBER 42
int f(void)
{
return MAGICNUMBER;
}
int main(void)
{
int i,k;
char buffer[ENOUGH];
/* Pointer to original function f */
int (*srcfptr)(void) = f;
/* Pointer to hold the manipulated function */
int (*dstfptr)(void) = (void*)buffer;
char* byte;
memcpy(dstfptr, srcfptr, ENOUGH);
/* Replace magic number inside the function with another */
for (i=0; i < ENOUGH; i++) {
byte = ((char*)dstfptr)+i;
if (*byte == MAGICNUMBER) {
*byte = OTHERMAGICNUMBER;
}
}
k = dstfptr();
/* Prints the other magic number */
printf("Hello %d!\n", k);
return 0;
}
The code now relies on just guessing that the function will fit in the 1000 byte buffer. It also violates rules by copying too much to the buffer, since the function f() will be most likely a lot shorter than 1000 bytes.
This brings us to the question: Is there a method to figure out the size of any given function in C? Some methods include looking into intermediate linker output, and guessing based on the instructions in the function, but that's just not quite enough. Is there any way to be sure?
Please note: It compiles and works on my system but doesn't quite adhere to standards because conversions between function pointers and void* aren't exactly allowed:
$ gcc -Wall -ansi -pedantic fptr.c -o fptr
fptr.c: In function 'main':
fptr.c:21: warning: ISO C forbids initialization between function pointer and 'void *'
fptr.c:23: warning: ISO C forbids passing argument 1 of 'memcpy' between function pointer and 'void *'
/usr/include/string.h:44: note: expected 'void * __restrict__' but argument is of type 'int (*)(void)'
fptr.c:23: warning: ISO C forbids passing argument 2 of 'memcpy' between function pointer and 'void *'
/usr/include/string.h:44: note: expected 'const void * __restrict__' but argument is of type 'int (*)(void)'
fptr.c:26: warning: ISO C forbids conversion of function pointer to object pointer type
$ ./fptr
Hello 42!
$
Please note: on some systems executing from writable memory is not possible and this code will crash. It has been tested with gcc 4.4.4 on Linux running on x86_64 architecture.
You cannot do this in C. Even if you knew the length, the address of a function matters, because function calls and accesses to certain types of data will use program-counter-relative addressing. Thus, a copy of the function located at a different address will not do the same thing as the original. Of course there are many other issues too.
In the C standard, there is no notion of introspection or reflection, thus you'd need to devise a method yourself, as you have done, some other safer methods exists however.
There are two ways:
Disassemble the function (at runtime) till you hit the final RETN/JMP/etc, while accounting for switch/jump tables. This of course requires some heavy analysis of the function you disassemble (using an engine like beaEngine), this is of course the most reliable, but its slow and heavy.
Abuse compilation units, this is very risky, and not fool proof, but if you know you compiler generates functions sequentially in their compilation unit, you can do something along these lines:
void MyFunc()
{
//...
}
void MyFuncSentinel()
{
}
//somewhere in code
size_t z = (uintptr_t)MyFuncSentinel - (uintptr_t)MyFunc;
uint8_t* buf = (uint8_t*)malloc(z);
memcpy(buf,(char*)MyFunc,z);
this will have some extra padding, but it will be minimal (and unreachable). although highly risky, its a lot faster that the disassemble method.
note: both methods will require that the target code has read permissions.
#R.. raises a very good point, your code won't be relocatable unless its PIC or you reassasmble it in-place to adjust the addresses etc.
Here is a standards compliant way of achieving the result you want:
int f(int magicNumber)
{
return magicNumber;
}
int main(void)
{
k = f(OTHERMAGICNUMBER);
/* Prints the other magic number */
printf("Hello %d!\n", k);
return 0;
}
Now, you may have lots of uses of f() all over the place with no arguments and not want to go through your code changing every one, so you could do this instead
int f()
{
return newf(MAGICNUMBER);
}
int newf(int magicNumber)
{
return magicNumber;
}
int main(void)
{
k = newf(OTHERMAGICNUMBER);
/* Prints the other magic number */
printf("Hello %d!\n", k);
return 0;
}
I'm not suggesting this is a direct answer to your problem but that what you are doing is so horrible, you need to rethink your design.
Well, you can obtain the length of a function at runtime using labels:
int f()
{
int length;
start:
length = &&end - &&start + 11; // 11 is the length of function prologue
// and epilogue, got with gdb
printf("Magic number: %d\n", MagicNumber);
end:
return length;
}
After executing this function we know its length, so we can malloc for the right length, copy and editing the code, then executing it.
int main()
{
int (*pointerToF)(), (*newFunc)(), length, i;
char *buffer, *byte;
length = f();
buffer = malloc(length);
if(!buffer) {
printf("can't malloc\n");
return 0;
}
pointerToF = f;
newFunc = (void*)buffer;
memcpy(newFunc, pointerToF, length);
for (i=0; i < length; i++) {
byte = ((char*)newFunc)+i;
if (*byte == MagicNumber) {
*byte = CrackedNumber;
}
}
newFunc();
}
Now there's another bigger problem though, the one #R. mentioned. Using this function once modified (correctly) results in segmentation fault when calling printf because the call instruction has to specify an offset which will be wrong. You can see this with gdb, using disassemble f to see the original code and x/15i buffer to see the edited one.
By the way, both my code and yours compile without warnings but crash on my machine (gcc 4.4.3) when calling the edited function.

Resources