Ambiguous behaviour of strcmp() - c

Please note that I have checked the relevant questions to this title, but from my point of view they are not related to this question.
Initially I thought that program1 and program2 would give me the same result.
//Program 1
char *a = "abcd";
char *b = "efgh";
printf("%d", strcmp(a,b));
//Output: -4
//Program 2
printf("%d", strcmp("abcd", "efgh"));
//Output: -1
Only difference that I can spot is that in the program2 I have passed string literal, while in program I've passed char * as the argument of the strcmp() function.
Why there is a difference between the behaviour of these seemingly same program?
Platform: Linux mint
compiler: g++
Edit: Actually the program1 always prints the difference of ascii code of the first mismatched characters, but the program2 print -1 if the ascii code of the first mismatched character in string2 is greater than that of string1 and vice versa.

This is your C code:
int x1()
{
char *a = "abcd";
char *b = "efgh";
printf("%d", strcmp(a,b));
}
int x2()
{
printf("%d", strcmp("abcd", "efgh"));
}
And this is the generated assembly output for both functions:
.LC0:
.string "abcd"
.LC1:
.string "efgh"
.LC2:
.string "%d"
x1:
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], OFFSET FLAT:.LC0
mov QWORD PTR [rbp-16], OFFSET FLAT:.LC1
mov rdx, QWORD PTR [rbp-16]
mov rax, QWORD PTR [rbp-8]
mov rsi, rdx
mov rdi, rax
call strcmp // the strcmp function is actually called
mov esi, eax
mov edi, OFFSET FLAT:.LC2
mov eax, 0
call printf
nop
leave
ret
x2:
push rbp
mov rbp, rsp
mov esi, -1 // strcmp is never called, the compiler
// knows what the result will be and it just
// uses -1
mov edi, OFFSET FLAT:.LC2
mov eax, 0
call printf
nop
pop rbp
ret
When the compiler sees strcmp("abcd", "efgh") it knows the result beforehand, because it knows that "abcd" comes before "efgh".
But if it sees strcmp(a,b) it does not know and hence generates code that actually calls strcmp.
With another compiler or with different compiler settings things could be different. You really shouldn't care about such details at least at a beginner's level.

It is indeed surprising that strcmp returns 2 different values for these calls, but it is not incompatible with the C Standard:
strcmp() returns a negative value if the first string is lexicographically before the second string. Both -4 and -1 are negative values.
As pointed by others, the code generated for the different calls is different:
the compiler generates a call to the library function in the first program
the compiler is able to determine the result of the comparison and generates an explicit result of -1 for the second case where both arguments are string literals.
In order to perform this compile time evaluation, strcmp must be defined in a subtile way in <string.h> so the compiler can determine that the program refers to the C library's implementation and not an alternative that might behave differently. Tracing the corresponding prototype in recent GNU libc include files is a bit difficult with a number of nested macros eventually leading to a hidden prototype.
Note that more recent versions of both gcc and clang will perform the optimisation in both cases as can be tested on Godbolt Compiler Explorer, but neither combines this optmisation with that of printf to generate the even more compact code puts("-1");. They seem to convert printf to puts only for string literal formats without arguments.

I believe (would need to see (and interpret) machine code) one version works without calling code in the library (as if you wrote printf("%d", -1);).

Related

Does using a return value change the behavior of a function c?

I had this piece of code.
#include <stdio.h>
int main() {
int i;
scanf("%d", &i);
printf("%x", i);
}
to which, when i give character 'a' as input, it spits out some random numbers in the output like "73152c" or "66152c" etc.
But when I change the code to this,
#include <stdio.h>
int main() {
int i;
int j = scanf("%d", &i);
printf("%x %d", i, j);
}
output will always be "2 0" for same input.
So, does using the return value of a function changes its behavior?
I'm using windows 10 64-bit with gcc 8.1.0 and compiling with no switches.
Using godbolt.org to examine the assembly code generated by GCC 8.1.0 with no switches, here is the assembly code for the main routine in your first program:
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-4]
mov rsi,rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call __isoc99_scanf
mov eax, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC1
mov eax, 0
call printf
mov eax, 0
leave
ret
and here is the code for your second program:
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-8]
mov rsi,rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call __isoc99_scanf
mov DWORD PTR [rbp-4], eax
mov eax, DWORD PTR [rbp-8]
mov edx, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC1
mov eax, 0
call printf
mov eax, 0
leave
ret
They differ in two places. In the first, this instruction passes the address of i to scanf:
lea rax, [rbp-4]
In the second, it is this instruction:
lea rax, [rbp-8]
These are different because, in your second program, the compiler has included space for j on the stack. For whatever reason, it decided to put j at rbp-4, the space used for i in the first program. This bumped i to rbp-8.
Then the code differ where the first program passes i to printf:
lea rax, [rbp-8]
and the second passes i and j:
mov eax, DWORD PTR [rbp-8]
mov edx, DWORD PTR [rbp-4]
And now we see why your programs print different things for i. In the first program, because a value is never put into i (because scanf makes no assignment for %d when the input contains the letter “a”), your program prints whatever data happened to be in [rbp-4] when main started. In the second program, your program prints whatever happened to be in [rbp-8].
What is in these stack locations is whatever is left from the start-up code that runs for main is called. This is special start-up code that sets up the C environment. It may do things with addresses in your program, and some addresses in your program are deliberately randomized in each execution by the program loader to foil attackers. (For further information, look into address space layout randomization.) It appears when the start-up code is done, it leaves some address in [rbp-4] and zero in [rbp-8]. So your first program prints some address for i and your second program prints zero.
So, the differences in this case were not caused by using or not using the return value of scanf. They were caused by having more or fewer variables, resulting in changes in where things were put on the stack.
This can of course change if you upgrade your C implementation and a different version of the start-up code is used or the compiler generates different code. Turning on optimization in the compiler, as with the -O3 switch, is likely to change the behavior too.
scanf returns the number of items successfully converted and assigned, or EOF on an end-of-file or error condition.
The %d conversion specifier tells scanf to skip over any leading whitespace, then read decimal digits up to the first non-digit character; that non-digit character is left in the input stream, and any digits that have been read are converted to an integer value and assigned to i.
If the first non-whitespace character is not a digit, then scanf stops reading immediately, returns 0 to indicate nothing was converted and assigned, and i is unchanged.
In the first snippet you don't initialize i before reading it and it is not being updated, which is why the output is seemingly random. You've technically invoked undefined behavior at this point by using that indeterminate value. One possible outcome of undefined behavior is getting a different result each time.
In the second snippet j gets the return value of scanf, which is 0. As for why i is always set to 2, again, since you're trying to use that indeterminate value, the behavior is undefined. Another possible outcome of undefined behavior is that you get the same result each time, for no apparent reason.
Actually, the behaviour of scanf is not changing.. You can conform this by including this line, printf("%x", i); before scanf. Instead here, my compiler initializes i to 2 when I use j. Otherwise i will not be initialized, and will have some random garbage value.
I don't think entering input of wrong type is undefined behavior - Correct me if I'm wrong. Instead it keeps the corresponding and following pointers untouched, and immediately returns from scanf. I ran multiple runs and every time value of i was unchanged.
Now the surprising thing is that, when j is declared, why i is always initialized to 2!!

Loop fission/invariant optimization not performed, why?

I am trying to learn more about assembly and which optimizations compilers can and cannot do.
I have a test piece of code for which I have some questions.
See it in action here: https://godbolt.org/z/pRztTT, or check the code and assembly below.
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[])
{
for (int j = 0; j < 100; j++) {
if (argc == 2 && argv[1][0] == '5') {
printf("yes\n");
}
else {
printf("no\n");
}
}
return 0;
}
The assembly produced by GCC 10.1 with -O3:
.LC0:
.string "no"
.LC1:
.string "yes"
main:
push rbp
mov rbp, rsi
push rbx
mov ebx, 100
sub rsp, 8
cmp edi, 2
je .L2
jmp .L3
.L5:
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
je .L4
.L2:
mov rax, QWORD PTR [rbp+8]
cmp BYTE PTR [rax], 53
jne .L5
mov edi, OFFSET FLAT:.LC1
call puts
sub ebx, 1
jne .L2
.L4:
add rsp, 8
xor eax, eax
pop rbx
pop rbp
ret
.L3:
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
je .L4
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
jne .L3
jmp .L4
It seems like GCC produces two versions of the loop: one with the argv[1][0] == '5' condition but without the argc == 2 condition, and one without any condition.
My questions:
What is preventing GCC from splitting away the full condition? It is similar to this question, but there is no chance for the code to get a pointer into argv here.
In the loop without any condition (L3 in assembly), why is the loop body duplicated? Is it to reduce the number of jumps while still fitting in some sort of cache?
GCC doesn't know that printf won't modify memory pointed-to by argv, so it can't hoist that check out of the loop.
argc is a local variable (that can't be pointed-to by any pointer global variable), so it knows that calling an opaque function can't modify it. Proving that a local variable is truly private is part of Escape Analysis.
The OP tested this by copying argv[1][0] into a local char variable first: that let GCC hoist the full condition out of the loop.
In practice argv[1] won't be pointing to memory that printf can modify. But we only know that because printf is a C standard library function, and we assume that main is only called by the CRT startup code with the actual command line args. Not by some other function in this program that passes its own args. In C (unlike C++), main is re-entrant and can be called from within the program.
Also, in GNU C, printf can have custom format-string handling functions registered with it. Although in this case, the compiler built-in printf looks at the format string and optimizes it to a puts call.
So printf is already partly special, but I don't think GCC bothers to look for optimizations based on it not modifying any other globally-reachable memory. With a custom stdio output buffer, that might not even be true. printf is slow; saving some spill / reloads around it is generally not a big deal.
Would (theoretically) compiling puts() together with this main() allow the compiler to see puts() isn't touching argv and optimize the loop fully?
Yes, e.g. if you'd written your own write function that uses an inline asm statement around a syscall instruction (with a memory input-only operand to make it safe while avoiding a "memory" clobber) then it could inline and assume that argv[1][0] wasn't changed by the asm statement and hoist a check based on it. Even if you were outputting argv[1].
Or maybe do inter-procedural optimization without inlining.
Re: unrolling: that's odd, -funroll-loops isn't on by default for GCC at -O3, only with -O3 -fprofile-use. Or if enabled manually.

Assembly: Purpose of loading the effective address before a call to a function?

Source C Code:
int main()
{
int i;
for(i=0, i < 10; i++)
{
printf("Hello World!\n");
}
}
Dump of Intel syntax x86 assembler code for function main:
1. 0x000055555555463a <+0>: push rbp
2. 0x000055555555463b <+1>: mov rbp,rsp
3. 0x000055555555463e <+4>: sub rsp,0x10
4. 0x0000555555554642 <+8>: mov DWORD PTR [rbp-0x4],0x0
5. 0x0000555555554649 <+15>: jmp 0x55555555465b <main+33>
6. 0x000055555555464b <+17>: lea rdi,[rip+0xa2] # 0x5555555546f4
7. 0x0000555555554652 <+24>: call 0x555555554510 <puts#plt>
8. 0x0000555555554657 <+29>: add DWORD PTR [rbp-0x4],0x1
9. 0x000055555555465b <+33>: cmp DWORD PTR [rbp-0x4],0x9
10. 0x000055555555465f <+37>: jle 0x55555555464b <main+17>
11. 0x0000555555554661 <+39>: mov eax,0x0
12. 0x0000555555554666 <+44>: leave
13. 0x0000555555554667 <+45>: ret
I'm currently working through "Hacking, The Art of Exploitation 2nd Edition by Jon Erickson", and I'm just starting to tackle assembly.
I have a few questions about the translation of the provided C code to Assembly, but I am mainly wondering about my first question.
1st Question: What is the purpose of line 6? (lea rdi,[rip+0xa2]).
My current working theory, is that this is used to save where the next instructions will jump to in order to track what is going on. I believe this line correlates with the printf function in the source C code.
So essentially, its loading the effective address of rip+0xa2 (0x5555555546f4) into the register rdi, to simply track where it will jump to for the printf function?
2nd Question: What is the purpose of line 11? (mov eax,0x0?)
I do not see a prior use of the register, EAX and am not sure why it needs to be set to 0.
The LEA puts a pointer to the string literal into a register, as the first arg for puts. The search term you're looking for is "calling convention" and/or ABI. (And also RIP-relative addressing). Why is the address of static variables relative to the Instruction Pointer?
The small offset between code and data (only +0xa2) is because the .rodata section gets linked into the same ELF segment as .text, and your program is tiny. (Newer gcc + ld versions will put it in a separate page so it can be non-executable.)
The compiler can't use a shorter more efficient mov edi, address in position-independent code in your Linux PIE executable. It would do that with gcc -fno-pie -no-pie
mov eax,0 implements the implicit return 0 at the end of main that C99 and C++ guarantee. EAX is the return-value register in all calling conventions.
If you don't use gcc -O2 or higher, you won't get peephole optimizations like xor-zeroing (xor eax,eax).
This:
lea rdi,[rip+0xa2]
Is a typical position independent LEA, putting the string address into a register (instead of loading from that memory address).
Your executable is position independent, meaning that it can be loaded at runtime at any address. Therefore, the real address of the argument to be passed to puts() needs to be calculated at runtime every single time, since the base address of the program could be different each time. Also, puts() is used instead of printf() because the compiler optimized the call since there is no need to format anything.
In this case, the binary was most probably loaded with the base address 0x555555554000. The string to use is stored in your binary at offset 0x6f4. Since the next instruction is at offset 0x652, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x6f4 - 0x652) = rip + 0xa2, which is what you see above. See this answer of mine for another example.
The purpose of:
mov eax,0x0
Is to set the return value of main(). In Intel x86, the calling convention is to return values in the rax register (eax if the value is 32 bits, which is true in this case since main returns an int). See the table entry for x86-64 at the end of this page.
Even if you don't add an explicit return statement, main() is a special function, and the compiler will add a default return 0 for you.
If you add some debug data and symbols to the assembly everything will be easier. It is also easier to read the code if you add some optimizations.
There is a very useful tool godbolt and your example https://godbolt.org/z/9sRFmU
On the asm listing there you can clearly see that that lines loads the address of the string literal which will be then printed by the function.
EAX is considered volatile and main by default returns zero and thats the reason why it is zeroed.
The calling convention is explained here: https://en.wikipedia.org/wiki/X86_calling_conventions
Here you have more interesting cases https://godbolt.org/z/M4MeGk

Multiplying values in an array using IMUL instruction produces incorrect values

I'm picking up ASM language and trying out the IMUL function on Ubuntu Eclipse C++, but for some reason I just cant seem to get the desired output from my code.
Required:
Multiply the negative elements of an integer array int_array by a specified integer inum
Here's my code for the above:
C code:
#include <stdio.h>
extern void multiply_function();
// Variables
int iaver, inum;
int int_ar[10] = {1,2,3,4,-9,6,7,8,9,10};
int main()
{
inum = 2;
multiply_function();
for(int i=0; i<10; i++){
printf("%d ",int_ar[i]);
}
}
ASM code:
extern int_ar
extern inum
global multiply_function
multiply_function:
enter 0,0
mov ecx, 10
mov eax, inum
multiply_loop:
cmp [int_ar +ecx*4-4], dword 0
jg .ifpositive
mov ebx, [int_ar +ecx*4-4]
imul ebx
cdq
mov [int_ar +ecx*4-4], eax
loop multiply_loop
leave
ret
.ifpositive:
loop multiply_loop
leave
ret
The Problem
For an array of: {1,2,3,4,-9,6,7,8,9,10} and inum, I get the output {1,2,3,4,-1210688460,6,7,8,9,10} which hints at some sort of overflow occurring.
Is there something I'm missing or understood wrong about how the IMUL function in assembly language for x86 works?
Expected Output
The output I expected is {1,2,3,4,-18,6,7,8,9,10}
My Thought Process
My thought process for the above task:
1) Find which array elements in array are negative, for each positive element found, do nothing and continue loop to next element
cmp [int_ar +ecx*4-4], dword 0
jg .ifpositive
.ifpositive:
loop multiply_loop
leave
ret
2) Upon finding the negative element, move its value into register EBX which will serve as SRC in the IMUL SRC function. Then extend register EAX to EAX-EDX where the result is stored in:
mov ebx, [int_ar +ecx*4-4]
imul ebx
cdq
3) Move the result into the negative element of the array by using MOV:
mov [int_ar +ecx*4-4], eax
4) Loop through to the next array element and repeat the above 1)-3)
Reason for Incorrect Values
If we look past the inefficiencies and unneeded code and deal with the real issue it comes down to this instruction:
mov eax, inum
What is inum? You created and initialized a global variable in C called inum with:
int iaver, inum;
[snip]
inum = 2;
inum as a variable is essentially a label to a memory location containing an int (32-bit value). In your assembly code you need to treat inum as a pointer to a value, not the value itself. In your assembly code you need to change:
mov eax, inum
to:
mov eax, [inum]
What your version does is moves the address of inum into EAX. Your code ended up multiplying the address of the variable by the negative numbers in your array. That cause the erroneous values you see. the square brackets around inum tell the assembler you want to treat inum as a memory operand, and that you want to move the 32-bit value at inuminto EAX.
Calling Convention
You appear to be creating a 32-bit program and running it on 32-bit Ubuntu. I can infer the possibility of a 32-bit Linux by the erroneous value of -1210688460 being returned. -1210688460 = 0xB7D65C34 divide by -9 and you get 804A06C. Programs on 32-bit Linux are usually loaded starting at 0x8048000
Whether running on 32-bit Linux or 64-bit Linux, assembly code linked with 32-bit C/C++ programs need to abide by the CDECL calling convention:
cdecl
The cdecl (which stands for C declaration) is a calling convention that originates from the C programming language and is used by many C compilers for the x86 architecture.1 In cdecl, subroutine arguments are passed on the stack. Integer values and memory addresses are returned in the EAX register, floating point values in the ST0 x87 register. Registers EAX, ECX, and EDX are caller-saved, and the rest are callee-saved. The x87 floating point registers ST0 to ST7 must be empty (popped or freed) when calling a new function, and ST1 to ST7 must be empty on exiting a function. ST0 must also be empty when not used for returning a value.
Your code clobbers EAX, EBX, ECX, and EDX. You are free to destroy the contents of EAX, ECX, and EDX but you must preserve EBX. If you don't you can cause problems for the C code calling the function. After you do the enter 0,0 instruction you can push ebx and just before each leave instruction you can do pop ebx
If you were to use -O1, -O2, or -O3 GCC compiler options to enable optimizations your program may not work as expected or crash altogether.

How is this string in an array represented in assembly when a C program is compiled using the gcc -S option?

This is a C program, which has been compiled to assembly using gcc -S. How is string "Hello, world" represented in this program?
This is the C-code:
1. #include <stdio.h>
2.
3. int main(void) {
4.
5. char st[] = "Hello, wolrd";
6. printf("%s\n", st);
7.
8. return 0;
9. }
Heres the assembly code:
1. .intel_syntax noprefix
2. .text
3. .globl main
4.
5. main:
6. push rbp
7. mov rbp, rsp
8. sub rsp, 32
9. mov rax, QWORD PTR fs:40
10 mov QWORD PTR [rbp-8], rax
11. xor eax, eax
12. movabs rax, 8583909746840200520
15. mov QWORD PTR [rbp-32], rax
14. mov DWORD PTR [rbp-24], 1684828783
15. mov BYTE PTR [rbp-20], 0
16. lea rax, [rbp-32]
17. mov rdi, rax
18. call puts
19. mov eax, 0
20. mov rdx, QWORD PTR [rbp-8]
21. xor rdx, QWORD PTR fs:40
22 je .L3
22. call __stack_chk_fail
23. .L3:
24. leave
25. ret
You are using a local buffer in function main, initialized from a string literal. The compiler compiles this initialization as setting the 16 bytes at [rbp-32] with 3 mov instructions. The first one via rax, the second immediate as the value is 32 bits, the third for a single byte.
8583909746840200520 in decimal is 0x77202c6f6c6c6548 in hex, corresponding to the bytes "Hello, W" in little endian order, 1684828783 is 0x646c726f, the bytes "orld". The third mov sets the final '\0' byte. Hence the buffer contains "Hello, World".
This string is then passed to puts for output to stdout.
Note that gcc optimized the call printf("%s\n", "Hello, World"); to puts("Hello, World");! By the way, clang performs the same optimization.
Interesting.
If you'd written const char *str="...", gcc would have passed puts a pointer to the string sitting there in the .rodata section, like in this godbolt link. (Well-spotted by chqrlie that gcc is optimizing printf to puts).
Your code forces the compiler to make a writeable copy of the string literal, by assigning it to a non-const char[]. (Actually, even with const char str[], gcc still generates it on the fly from mov-immediates. clang-3.7 spots the chance to optimize, though.)
Interestingly, it encodes it into immediate data, rather than copying into the buffer. If the array had been global, it would have just been sitting there in the regular .data section, not .rodata.
Also, in general avoid using main() to see compiler optimization. gcc on purpose marks it as "cold", and optimizes it less. This is an advantage for real programs that do their real work in other functions. No difference in this case, renaming main. But usually if you're looking at how gcc optimizes something, it's best to write a function that takes some args, and use those. Then you don't have to worry about gcc seeing that the inputs or loop-bounds are compile-time constants, either.

Resources