Unexpected behavior when calling implicitly declared function - c

I am learning C and wrote the following code:
#include <stdio.h>
int main()
{
double a = 2.5;
say(a);
}
void say(int num)
{
printf("%u\n", num);
}
When I compile this program, the compiler gives following warnings:
test.c: In function ‘main’:
test.c:6:2: warning: implicit declaration of function ‘say’ [-Wimplicit-function-declaration]
6 | say(a);
| ^~~
test.c: At top level:
test.c:9:6: warning: conflicting types for ‘say’
9 | void say(int num)
| ^~~
test.c:6:2: note: previous implicit declaration of ‘say’ was here
6 | say(a);
| ^~~
Running the program unexpectedly leads to a 1 being printed. From my limited understanding, because I did not add a function prototype for the compiler, the compiler implicitly creates one from the function call on line 6, expecting a double as a parameter and warns me about this implicit declaration. But later I define the function with a parameter of type int. The compiler gives me two warnings about the type mismatch.
I expect argument coercion, meaning the double will be converted to an integer. But in that case, the output should be 2, and not a 1. What exactly is going on here?

What exactly is going on here?
From the C standard perspective it's undefined behavior.
What exactly is going on here?
I am assuming you are using x86_64 architecture. The psABI-x86_64 standard defines how variables should be passed to functions on that architecture. double arguments are passed via %xmm0 register, and edi register is used to pass 1st argument to function.
Your compiler most probably produces:
main:
push rbp
mov rbp, rsp
sub rsp, 16
movsd xmm0, QWORD PTR .LC0[rip]
movsd QWORD PTR [rbp-8], xmm0
mov rax, QWORD PTR [rbp-8]
movq xmm0, rax ; set xmm0 to the value of double
mov eax, 1 ; I guess gcc assumes `int say(double, ...)` for safety
call say
mov eax, 0
leave
ret
say:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], edi ; read %edi
mov eax, DWORD PTR [rbp-4]
mov esi, eax ; pass value in %edi as argument to printf
mov edi, OFFSET FLAT:.LC1
mov eax, 0
call printf
nop
leave
ret
Ie. main set's %xmm0 to the value of double. Yet say() reads from %edi register that was nowhere set in your code. Because there is some left-over value 1 in edi, most probably from crt0 or such, you code prints 1.
#edit The leftover value actually comes from main arguments. It's int main(int argc, char *argv[]) - because your program is passed no arguments, argc is set to 1 by startup code, which means that the leftover value in %edi is 1.
Well, you can for example "manually" set the %edi value to some value by calling a function that takes int before calling say. The following code prints the value that I put in func() call.
int func(int a) {}
int main() {
func(50); // set %edi to 50
double a = 2.5;
say(a);
}
void say(int num) {
printf("%u\n", num); // will print '50', the leftover in %edi
}

I expect argument coercion
If you had declared the function properly, that's what you'd get. But as you correctly pointed out, you didn't declare the function and got an implicit declaration that takes a double as an argument. So when the compiler sees the function call it sees a function call where the argument is a double and the function takes a double. Therefore it has no reason to coerce anything. It just generates the usual code for calling a function with a double as an argument.
What exactly is going on here?
In terms of the C language, it's undefined behaviour and that's it.
In terms of implementation, what's likely happening is that, as I said, the compiler will generate the usual code for calling a function with a double. On a 64-bit x86 architecture using the usual calling conventions, this will mean putting the value 2.5 into the XMM0 register and then calling the function. The function itself will assume that the argument is an int, so it will read its value from the EDI register (or ECX using Microsoft's calling convention), which is the register used to pass the first integer argument. So the argument is written into one register and then read from a totally different register, so you'll get whatever happened to be in that register.
Still, what exactly would qualify it as [undefined behaviour]?
The fact that you (implicitly) declared the function using one type, but then defined it using another. If the declaration and definition of a function don't match, that causes undefined behaviour.

Related

Why I am getting weird output from the following code? [duplicate]

This question already has answers here:
Return value from writing an unused parameter when falling off the end of a non-void function
(1 answer)
Checking return value of a function without return statement
(3 answers)
Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
(1 answer)
Closed 1 year ago.
I think I have found a problem with the way functions are handled by the gcc compiler.
I don't know if it's a mistake or a never distraction on something I've let slip over the years.
In practice, by declaring a function and defining the latter having a return value, the compiler stores the value of the first variable allocated in the range of the function in the EAX register, and then stores it, in turn, within a variable. Example:
#include<stdio.h>
int add(int a, int b)
{
int c = a + b;
;there isn't return
}
int main(void)
{
int res = add(3, 2);
return 0;
}
This is the output:
5
This is the x86-64 Assembly with intel syntax:
Function add:
push rbp
mov rbp, rsp
mov DWORD PTR[rbp-0x14], edi ;store first
mov DWORD PTR[rbp-0x18], esi ;store second
mov edx, DWORD PTR[rbp-0x14]
mov eax, DWORD PTR[rbp-0x18]
add eax, esx
mov DWORD PTR[rbp-0x4], eax
nop
pop rbp
ret
Function main:
push rbp
mov rbp, rsp
sub rsp, 0x10
mov esi, 0x2 ;first parameter
mov edi, 0x3 ;second parameter
call 0x1129 <add>
;WHAT??? eax = a + b, why store it?
mov DWORD PTR[rbp-0x4], eax
mov eax, 0x0
leave
ret
As you can see, it saves me the sum of the parameters a and b in the variable c, but then it saves me in the variable res the eax register containing their sum, as if a function returned values.
Is this done because the function was defined with a return value?
What you've done is trigger undefined behavior by failing to return a value from a function and then attempting to use that return value.
This is documented in section 6.9.1p12 of the C standard:
If the } that terminates a function is reached, and the value of the function call is used by the caller, the behavior is undefined.
One of the ways that undefined behavior can manifest itself is by the program appearing to work properly, as you've seen. However, there's no guarantee that it will continue to work if for example, you added some unrelated code or compiled with different optimization settings.
eax is the register used to return a value, in this case because it is supposed to return int. So the caller gets whatever happens to be in that registers. However you should have gotten at least a warning, that there is no return statement.
Because your function is pretty small and the compiler decided to use the eax register for it's calculation, it appears to work.
If you switch on optimization or provide a more complex function, the result will be quite different.

Returning a value from a function with no return [duplicate]

This question already has answers here:
Return value from writing an unused parameter when falling off the end of a non-void function
(1 answer)
Checking return value of a function without return statement
(3 answers)
Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
(1 answer)
Closed 1 year ago.
I think I have found a problem with the way functions are handled by the gcc compiler.
I don't know if it's a mistake or a never distraction on something I've let slip over the years.
In practice, by declaring a function and defining the latter having a return value, the compiler stores the value of the first variable allocated in the range of the function in the EAX register, and then stores it, in turn, within a variable. Example:
#include<stdio.h>
int add(int a, int b)
{
int c = a + b;
;there isn't return
}
int main(void)
{
int res = add(3, 2);
return 0;
}
This is the output:
5
This is the x86-64 Assembly with intel syntax:
Function add:
push rbp
mov rbp, rsp
mov DWORD PTR[rbp-0x14], edi ;store first
mov DWORD PTR[rbp-0x18], esi ;store second
mov edx, DWORD PTR[rbp-0x14]
mov eax, DWORD PTR[rbp-0x18]
add eax, esx
mov DWORD PTR[rbp-0x4], eax
nop
pop rbp
ret
Function main:
push rbp
mov rbp, rsp
sub rsp, 0x10
mov esi, 0x2 ;first parameter
mov edi, 0x3 ;second parameter
call 0x1129 <add>
;WHAT??? eax = a + b, why store it?
mov DWORD PTR[rbp-0x4], eax
mov eax, 0x0
leave
ret
As you can see, it saves me the sum of the parameters a and b in the variable c, but then it saves me in the variable res the eax register containing their sum, as if a function returned values.
Is this done because the function was defined with a return value?
What you've done is trigger undefined behavior by failing to return a value from a function and then attempting to use that return value.
This is documented in section 6.9.1p12 of the C standard:
If the } that terminates a function is reached, and the value of the function call is used by the caller, the behavior is undefined.
One of the ways that undefined behavior can manifest itself is by the program appearing to work properly, as you've seen. However, there's no guarantee that it will continue to work if for example, you added some unrelated code or compiled with different optimization settings.
eax is the register used to return a value, in this case because it is supposed to return int. So the caller gets whatever happens to be in that registers. However you should have gotten at least a warning, that there is no return statement.
Because your function is pretty small and the compiler decided to use the eax register for it's calculation, it appears to work.
If you switch on optimization or provide a more complex function, the result will be quite different.

Does using a return value change the behavior of a function c?

I had this piece of code.
#include <stdio.h>
int main() {
int i;
scanf("%d", &i);
printf("%x", i);
}
to which, when i give character 'a' as input, it spits out some random numbers in the output like "73152c" or "66152c" etc.
But when I change the code to this,
#include <stdio.h>
int main() {
int i;
int j = scanf("%d", &i);
printf("%x %d", i, j);
}
output will always be "2 0" for same input.
So, does using the return value of a function changes its behavior?
I'm using windows 10 64-bit with gcc 8.1.0 and compiling with no switches.
Using godbolt.org to examine the assembly code generated by GCC 8.1.0 with no switches, here is the assembly code for the main routine in your first program:
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-4]
mov rsi,rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call __isoc99_scanf
mov eax, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC1
mov eax, 0
call printf
mov eax, 0
leave
ret
and here is the code for your second program:
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-8]
mov rsi,rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call __isoc99_scanf
mov DWORD PTR [rbp-4], eax
mov eax, DWORD PTR [rbp-8]
mov edx, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC1
mov eax, 0
call printf
mov eax, 0
leave
ret
They differ in two places. In the first, this instruction passes the address of i to scanf:
lea rax, [rbp-4]
In the second, it is this instruction:
lea rax, [rbp-8]
These are different because, in your second program, the compiler has included space for j on the stack. For whatever reason, it decided to put j at rbp-4, the space used for i in the first program. This bumped i to rbp-8.
Then the code differ where the first program passes i to printf:
lea rax, [rbp-8]
and the second passes i and j:
mov eax, DWORD PTR [rbp-8]
mov edx, DWORD PTR [rbp-4]
And now we see why your programs print different things for i. In the first program, because a value is never put into i (because scanf makes no assignment for %d when the input contains the letter “a”), your program prints whatever data happened to be in [rbp-4] when main started. In the second program, your program prints whatever happened to be in [rbp-8].
What is in these stack locations is whatever is left from the start-up code that runs for main is called. This is special start-up code that sets up the C environment. It may do things with addresses in your program, and some addresses in your program are deliberately randomized in each execution by the program loader to foil attackers. (For further information, look into address space layout randomization.) It appears when the start-up code is done, it leaves some address in [rbp-4] and zero in [rbp-8]. So your first program prints some address for i and your second program prints zero.
So, the differences in this case were not caused by using or not using the return value of scanf. They were caused by having more or fewer variables, resulting in changes in where things were put on the stack.
This can of course change if you upgrade your C implementation and a different version of the start-up code is used or the compiler generates different code. Turning on optimization in the compiler, as with the -O3 switch, is likely to change the behavior too.
scanf returns the number of items successfully converted and assigned, or EOF on an end-of-file or error condition.
The %d conversion specifier tells scanf to skip over any leading whitespace, then read decimal digits up to the first non-digit character; that non-digit character is left in the input stream, and any digits that have been read are converted to an integer value and assigned to i.
If the first non-whitespace character is not a digit, then scanf stops reading immediately, returns 0 to indicate nothing was converted and assigned, and i is unchanged.
In the first snippet you don't initialize i before reading it and it is not being updated, which is why the output is seemingly random. You've technically invoked undefined behavior at this point by using that indeterminate value. One possible outcome of undefined behavior is getting a different result each time.
In the second snippet j gets the return value of scanf, which is 0. As for why i is always set to 2, again, since you're trying to use that indeterminate value, the behavior is undefined. Another possible outcome of undefined behavior is that you get the same result each time, for no apparent reason.
Actually, the behaviour of scanf is not changing.. You can conform this by including this line, printf("%x", i); before scanf. Instead here, my compiler initializes i to 2 when I use j. Otherwise i will not be initialized, and will have some random garbage value.
I don't think entering input of wrong type is undefined behavior - Correct me if I'm wrong. Instead it keeps the corresponding and following pointers untouched, and immediately returns from scanf. I ran multiple runs and every time value of i was unchanged.
Now the surprising thing is that, when j is declared, why i is always initialized to 2!!

When return type is default in function header and return statement is not provided, why is undefined behavior happening? [duplicate]

This question already has answers here:
Is there a default return value of C++ functions? [duplicate]
(2 answers)
Why does flowing off the end of a non-void function without returning a value not produce a compiler error?
(11 answers)
Closed 2 years ago.
When I execute the following program in GCC, I expect output to be 0 because the return statement is missing in max() function, but when I execute it, the output is 10. What is the reason for such behavior??
#include <stdio.h>
max(int x, int y)
{
if (x > y)
return x;
else
x=5;
}
int main(void)
{
int a = 10, b = 20;
// Calling above function to find max of 'a' and 'b'
int m = max(a, b);
printf("%d", m);
}
When return type is default in function header and return statement is not provided, why is undefined behavior happening?
Because the standard does not define what should happen. Thus the catchy term "undefined behavior". If the standard does not define what should happen... then literally anything can happen, the code can behave in any way.
It is explicitly stated that your code has undefined behavior in the standard C11 6.9.1p12.
What is the reason for such behavior??
Because the behavior is not defined, the compiler is not required to generate any code with any predictable behavior. So it generates some code that just happens to act as-if the function would have returned 10, but no one really cares because it can do what it wants.
The pragmatic answer comes from inspecting the assembly generated from the compiler (godbolt link) (once again, compiler can do what it wants). In main function compiler places mov DWORD PTR [rbp-4], 10 then mov eax, DWORD PTR [rbp-4] then mov edi, eax - ie. edi register has 10. Then in max compiler does mov DWORD PTR [rbp-4], edi and mov eax, DWORD PTR [rbp-4] then returns from the function - when max returns, the content of eax register has 10. Because compiler uses mov esi, eax to initialize arguments for printf, printf functions prints the value 10, which is what is left over in the eax register.
When I execute the following program in GCC
You don't execute something in the compiler!
You compile your code with gcc!
I expect output to be 0 because the return statement is missing
Your expectation is simply wrong. C nor C++ have a "default return value in case of missing return statement" So you simply must provide an return statement if you function is not defined void with value to make your programme well formed!
Every compiler will give you at minimum a warning. You should allways compile with -Wall or better -Wextra -pedantic.
It's a good practice to see the assembly code for more clarification. Let's see the assembly code for the function max(int,int).
Assembly code obtained from gcc -S [filename].c
max: //max() function starts here
.LFB0:
.cfi_startproc
endbr64
pushq %rbp //Stack Push
.cfi_def_cfa_offset 16 //ignore
.cfi_offset 6, -16 //ignore
movq %rsp, %rbp //ignore
.cfi_def_cfa_register 6 //ignore
movl %edi, -4(%rbp) //move int a to base pointer register from offset of 4 bytes
movl %esi, -8(%rbp) //move int b to base pointer register from offset of 8 bytes
movl -4(%rbp), %eax //move int a to extended ax register
cmpl -8(%rbp), %eax //compare b>a
jle .L2 //If cmp above succeeds execute this jle, as it checks for zero and carry flag
movl -4(%rbp), %eax //If cmp above fails start executing from this statement, int a is moved to extended ax register
jmp .L1 //jump to L1
.L2:
movl $5, -4(%rbp) //move number 5 to base pointer register from offset of 4 bytes
.L1:
popq %rbp //Stack Pop
.cfi_def_cfa 7, 8 //ignore
ret //Return value of extended ax register to the caller and control also.
.cfi_endproc
As, we can conclude from the above assembly code, no matter what the condition evaluates, it is always the extended ax register that is returned to the caller(for x86 system).
And the compiler for this case stored the first argument i.e. a to eax register.

Ambiguous behaviour of strcmp()

Please note that I have checked the relevant questions to this title, but from my point of view they are not related to this question.
Initially I thought that program1 and program2 would give me the same result.
//Program 1
char *a = "abcd";
char *b = "efgh";
printf("%d", strcmp(a,b));
//Output: -4
//Program 2
printf("%d", strcmp("abcd", "efgh"));
//Output: -1
Only difference that I can spot is that in the program2 I have passed string literal, while in program I've passed char * as the argument of the strcmp() function.
Why there is a difference between the behaviour of these seemingly same program?
Platform: Linux mint
compiler: g++
Edit: Actually the program1 always prints the difference of ascii code of the first mismatched characters, but the program2 print -1 if the ascii code of the first mismatched character in string2 is greater than that of string1 and vice versa.
This is your C code:
int x1()
{
char *a = "abcd";
char *b = "efgh";
printf("%d", strcmp(a,b));
}
int x2()
{
printf("%d", strcmp("abcd", "efgh"));
}
And this is the generated assembly output for both functions:
.LC0:
.string "abcd"
.LC1:
.string "efgh"
.LC2:
.string "%d"
x1:
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], OFFSET FLAT:.LC0
mov QWORD PTR [rbp-16], OFFSET FLAT:.LC1
mov rdx, QWORD PTR [rbp-16]
mov rax, QWORD PTR [rbp-8]
mov rsi, rdx
mov rdi, rax
call strcmp // the strcmp function is actually called
mov esi, eax
mov edi, OFFSET FLAT:.LC2
mov eax, 0
call printf
nop
leave
ret
x2:
push rbp
mov rbp, rsp
mov esi, -1 // strcmp is never called, the compiler
// knows what the result will be and it just
// uses -1
mov edi, OFFSET FLAT:.LC2
mov eax, 0
call printf
nop
pop rbp
ret
When the compiler sees strcmp("abcd", "efgh") it knows the result beforehand, because it knows that "abcd" comes before "efgh".
But if it sees strcmp(a,b) it does not know and hence generates code that actually calls strcmp.
With another compiler or with different compiler settings things could be different. You really shouldn't care about such details at least at a beginner's level.
It is indeed surprising that strcmp returns 2 different values for these calls, but it is not incompatible with the C Standard:
strcmp() returns a negative value if the first string is lexicographically before the second string. Both -4 and -1 are negative values.
As pointed by others, the code generated for the different calls is different:
the compiler generates a call to the library function in the first program
the compiler is able to determine the result of the comparison and generates an explicit result of -1 for the second case where both arguments are string literals.
In order to perform this compile time evaluation, strcmp must be defined in a subtile way in <string.h> so the compiler can determine that the program refers to the C library's implementation and not an alternative that might behave differently. Tracing the corresponding prototype in recent GNU libc include files is a bit difficult with a number of nested macros eventually leading to a hidden prototype.
Note that more recent versions of both gcc and clang will perform the optimisation in both cases as can be tested on Godbolt Compiler Explorer, but neither combines this optmisation with that of printf to generate the even more compact code puts("-1");. They seem to convert printf to puts only for string literal formats without arguments.
I believe (would need to see (and interpret) machine code) one version works without calling code in the library (as if you wrote printf("%d", -1);).

Resources