Is there a command execution vulnerability in this C program? - c

So I am working on a challenge problem to find a vulnerability in a C program binary that allows a command to be executed by the program (using the effective UID in Linux).
I am really struggling to find how to do this with this particular program.
The disassembly of the function in question (main function):
**************************************************************
* *
* FUNCTION *
**************************************************************
int __cdecl main(int argc, char * * argv)
int EAX:4 <RETURN>
int Stack[0x4]:4 argc
char * * Stack[0x8]:4 argv XREF[2]: 000109b0(R),
000109dd(R)
undefined4 Stack[-0x8]:4 local_8 XREF[1]: 00010bcb(R)
int Stack[-0xc]:4 in XREF[5]: 000109f0(W),
000109f3(R),
00010ad4(R),
00010b27(R),
00010b59(R)
int Stack[-0x10]:4 fd XREF[6]: 00010a1f(W),
00010a22(R),
00010aa5(R),
00010ab2(R),
00010ac9(R),
00010b4e(R)
pid_t Stack[-0x14]:4 pid XREF[4]: 00010a6b(W),
00010a6e(R),
00010a8b(R),
00010b6a(R)
int[2] Stack[-0x1c]:8 pipefd XREF[3,3]: 00010a3f(*),
00010a95(R),
00010b42(R),
00010abd(R),
00010b0f(R),
00010b36(R)
char Stack[-0x1d]:1 c XREF[2]: 00010b14(*),
00010b23(*)
int Stack[-0x24]:4 status XREF[2]: 00010b66(*),
00010b75(R)
main XREF[5]: Entry Point(*),
_start:00010866(*), 00010d30,
00010da0(*), 00011f34(*)
0001097d 55 PUSH EBP
0001097e 89 e5 MOV EBP,ESP
00010980 53 PUSH EBX
00010981 83 ec 1c SUB ESP,0x1c
00010984 e8 87 16 CALL <EXTERNAL>::geteuid __uid_t geteuid(void)
00 00
00010989 89 c3 MOV EBX,EAX
0001098b e8 80 16 CALL <EXTERNAL>::geteuid __uid_t geteuid(void)
00 00
00010990 53 PUSH EBX
00010991 50 PUSH EAX
00010992 e8 9d 16 CALL <EXTERNAL>::setreuid int setreuid(__uid_t __ruid, __u
00 00
00010997 83 c4 08 ADD ESP,0x8
0001099a e8 75 16 CALL <EXTERNAL>::getegid __gid_t getegid(void)
00 00
0001099f 89 c3 MOV EBX,EAX
000109a1 e8 6e 16 CALL <EXTERNAL>::getegid __gid_t getegid(void)
00 00
000109a6 53 PUSH EBX
000109a7 50 PUSH EAX
000109a8 e8 9b 16 CALL <EXTERNAL>::setregid int setregid(__gid_t __rgid, __g
00 00
000109ad 83 c4 08 ADD ESP,0x8
000109b0 8b 45 0c MOV EAX,dword ptr [EBP + argv]
000109b3 83 c0 04 ADD EAX,0x4
000109b6 8b 00 MOV EAX,dword ptr [EAX]
000109b8 85 c0 TEST EAX,EAX
000109ba 75 21 JNZ LAB_000109dd
000109bc a1 98 1f MOV EAX,[stderr]
01 00
000109c1 50 PUSH EAX
000109c2 6a 22 PUSH 0x22
000109c4 6a 01 PUSH 0x1
000109c6 68 50 0c PUSH s_Please_specify_the_file_to_verif_00010c50 = "Please specify the file to ve
01 00
000109cb e8 50 16 CALL <EXTERNAL>::fwrite size_t fwrite(void * __ptr, size
00 00
000109d0 83 c4 10 ADD ESP,0x10
000109d3 b8 01 00 MOV EAX,0x1
00 00
000109d8 e9 ee 01 JMP LAB_00010bcb
00 00
LAB_000109dd XREF[1]: 000109ba(j)
000109dd 8b 45 0c MOV EAX,dword ptr [EBP + argv]
000109e0 83 c0 04 ADD EAX,0x4
000109e3 8b 00 MOV EAX,dword ptr [EAX]
000109e5 6a 00 PUSH 0x0
000109e7 50 PUSH EAX
000109e8 e8 43 16 CALL <EXTERNAL>::open int open(char * __file, int __of
00 00
000109ed 83 c4 08 ADD ESP,0x8
000109f0 89 45 f8 MOV dword ptr [EBP + in],EAX
000109f3 83 7d f8 00 CMP dword ptr [EBP + in],0x0
000109f7 79 17 JNS LAB_00010a10
000109f9 68 73 0c PUSH DAT_00010c73 = 6Fh o
01 00
000109fe e8 19 16 CALL <EXTERNAL>::perror void perror(char * __s)
00 00
00010a03 83 c4 04 ADD ESP,0x4
00010a06 b8 02 00 MOV EAX,0x2
00 00
00010a0b e9 bb 01 JMP LAB_00010bcb
00 00
LAB_00010a10 XREF[1]: 000109f7(j)
00010a10 6a 02 PUSH 0x2
00010a12 68 78 0c PUSH s_/dev/null_00010c78 = "/dev/null"
01 00
00010a17 e8 14 16 CALL <EXTERNAL>::open int open(char * __file, int __of
00 00
00010a1c 83 c4 08 ADD ESP,0x8
00010a1f 89 45 f4 MOV dword ptr [EBP + fd],EAX
00010a22 83 7d f4 00 CMP dword ptr [EBP + fd],0x0
00010a26 79 17 JNS LAB_00010a3f
00010a28 68 73 0c PUSH DAT_00010c73 = 6Fh o
01 00
00010a2d e8 ea 15 CALL <EXTERNAL>::perror void perror(char * __s)
00 00
00010a32 83 c4 04 ADD ESP,0x4
00010a35 b8 05 00 MOV EAX,0x5
00 00
00010a3a e9 8c 01 JMP LAB_00010bcb
00 00
LAB_00010a3f XREF[1]: 00010a26(j)
00010a3f 8d 45 e8 LEA EAX=>pipefd,[EBP + -0x18]
00010a42 50 PUSH EAX
00010a43 e8 f8 15 CALL <EXTERNAL>::pipe int pipe(int * __pipedes)
00 00
00010a48 83 c4 04 ADD ESP,0x4
00010a4b 85 c0 TEST EAX,EAX
00010a4d 79 17 JNS LAB_00010a66
00010a4f 68 82 0c PUSH DAT_00010c82 = 70h p
01 00
00010a54 e8 c3 15 CALL <EXTERNAL>::perror void perror(char * __s)
00 00
00010a59 83 c4 04 ADD ESP,0x4
00010a5c b8 03 00 MOV EAX,0x3
00 00
00010a61 e9 65 01 JMP LAB_00010bcb
00 00
LAB_00010a66 XREF[1]: 00010a4d(j)
00010a66 e8 d9 15 CALL <EXTERNAL>::fork __pid_t fork(void)
00 00
00010a6b 89 45 f0 MOV dword ptr [EBP + pid],EAX
00010a6e 83 7d f0 00 CMP dword ptr [EBP + pid],0x0
00010a72 79 17 JNS LAB_00010a8b
00010a74 68 87 0c PUSH DAT_00010c87 = 66h f
01 00
00010a79 e8 9e 15 CALL <EXTERNAL>::perror void perror(char * __s)
00 00
00010a7e 83 c4 04 ADD ESP,0x4
00010a81 b8 04 00 MOV EAX,0x4
00 00
00010a86 e9 40 01 JMP LAB_00010bcb
00 00
LAB_00010a8b XREF[1]: 00010a72(j)
00010a8b 83 7d f0 00 CMP dword ptr [EBP + pid],0x0
00010a8f 0f 85 8c JNZ LAB_00010b21
00 00 00
00010a95 8b 45 e8 MOV EAX,dword ptr [EBP + pipefd[0]]
00010a98 6a 00 PUSH 0x0
00010a9a 50 PUSH EAX
00010a9b e8 60 15 CALL <EXTERNAL>::dup2 int dup2(int __fd, int __fd2)
00 00
00010aa0 83 c4 08 ADD ESP,0x8
00010aa3 6a 01 PUSH 0x1
00010aa5 ff 75 f4 PUSH dword ptr [EBP + fd]
00010aa8 e8 53 15 CALL <EXTERNAL>::dup2 int dup2(int __fd, int __fd2)
00 00
00010aad 83 c4 08 ADD ESP,0x8
00010ab0 6a 02 PUSH 0x2
00010ab2 ff 75 f4 PUSH dword ptr [EBP + fd]
00010ab5 e8 46 15 CALL <EXTERNAL>::dup2 int dup2(int __fd, int __fd2)
00 00
00010aba 83 c4 08 ADD ESP,0x8
00010abd 8b 45 ec MOV EAX,dword ptr [EBP + pipefd[1]]
00010ac0 50 PUSH EAX
00010ac1 e8 8a 15 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010ac6 83 c4 04 ADD ESP,0x4
00010ac9 ff 75 f4 PUSH dword ptr [EBP + fd]
00010acc e8 7f 15 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010ad1 83 c4 04 ADD ESP,0x4
00010ad4 ff 75 f8 PUSH dword ptr [EBP + in]
00010ad7 e8 74 15 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010adc 83 c4 04 ADD ESP,0x4
00010adf 6a 00 PUSH 0x0
00010ae1 68 8c 0c PUSH s_-asxml_00010c8c = "-asxml"
01 00
00010ae6 68 93 0c PUSH DAT_00010c93 = 74h t
01 00
00010aeb 68 93 0c PUSH DAT_00010c93 = 74h t
01 00
00010af0 e8 17 15 CALL <EXTERNAL>::execlp int execlp(char * __file, char *
00 00
00010af5 83 c4 10 ADD ESP,0x10
00010af8 68 98 0c PUSH s_execlp_00010c98 = "execlp"
01 00
00010afd e8 1a 15 CALL <EXTERNAL>::perror void perror(char * __s)
00 00
00010b02 83 c4 04 ADD ESP,0x4
00010b05 b8 05 00 MOV EAX,0x5
00 00
00010b0a e9 bc 00 JMP LAB_00010bcb
00 00
LAB_00010b0f XREF[1]: 00010b34(j)
00010b0f 8b 45 ec MOV EAX,dword ptr [EBP + pipefd[1]]
00010b12 6a 01 PUSH 0x1
00010b14 8d 55 e7 LEA EDX=>c,[EBP + -0x19]
00010b17 52 PUSH EDX
00010b18 50 PUSH EAX
00010b19 e8 1e 15 CALL <EXTERNAL>::write ssize_t write(int __fd, void * _
00 00
00010b1e 83 c4 0c ADD ESP,0xc
LAB_00010b21 XREF[1]: 00010a8f(j)
00010b21 6a 01 PUSH 0x1
00010b23 8d 45 e7 LEA EAX=>c,[EBP + -0x19]
00010b26 50 PUSH EAX
00010b27 ff 75 f8 PUSH dword ptr [EBP + in]
00010b2a e8 d5 14 CALL <EXTERNAL>::read ssize_t read(int __fd, void * __
00 00
00010b2f 83 c4 0c ADD ESP,0xc
00010b32 85 c0 TEST EAX,EAX
00010b34 75 d9 JNZ LAB_00010b0f
00010b36 8b 45 ec MOV EAX,dword ptr [EBP + pipefd[1]]
00010b39 50 PUSH EAX
00010b3a e8 11 15 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010b3f 83 c4 04 ADD ESP,0x4
00010b42 8b 45 e8 MOV EAX,dword ptr [EBP + pipefd[0]]
00010b45 50 PUSH EAX
00010b46 e8 05 15 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010b4b 83 c4 04 ADD ESP,0x4
00010b4e ff 75 f4 PUSH dword ptr [EBP + fd]
00010b51 e8 fa 14 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010b56 83 c4 04 ADD ESP,0x4
00010b59 ff 75 f8 PUSH dword ptr [EBP + in]
00010b5c e8 ef 14 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010b61 83 c4 04 ADD ESP,0x4
00010b64 6a 00 PUSH 0x0
00010b66 8d 45 e0 LEA EAX=>status,[EBP + -0x20]
00010b69 50 PUSH EAX
00010b6a ff 75 f0 PUSH dword ptr [EBP + pid]
00010b6d e8 b2 14 CALL <EXTERNAL>::waitpid __pid_t waitpid(__pid_t __pid, i
00 00
00010b72 83 c4 0c ADD ESP,0xc
00010b75 8b 45 e0 MOV EAX,dword ptr [EBP + status]
00010b78 c1 f8 08 SAR EAX,0x8
00010b7b 0f b6 c0 MOVZX EAX,AL
00010b7e 83 f8 01 CMP EAX,0x1
00010b81 74 18 JZ LAB_00010b9b
00010b83 83 f8 02 CMP EAX,0x2
00010b86 74 22 JZ LAB_00010baa
00010b88 85 c0 TEST EAX,EAX
00010b8a 75 2d JNZ LAB_00010bb9
00010b8c 68 9f 0c PUSH DAT_00010c9f = 4Fh O
01 00
00010b91 e8 92 14 CALL <EXTERNAL>::puts int puts(char * __s)
00 00
00010b96 83 c4 04 ADD ESP,0x4
00010b99 eb 2b JMP LAB_00010bc6
LAB_00010b9b XREF[1]: 00010b81(j)
00010b9b 68 a4 0c PUSH s_Your_file_is_not_completely_comp_00010ca4 = "Your file is not completely c
01 00
00010ba0 e8 83 14 CALL <EXTERNAL>::puts int puts(char * __s)
00 00
00010ba5 83 c4 04 ADD ESP,0x4
00010ba8 eb 1c JMP LAB_00010bc6
LAB_00010baa XREF[1]: 00010b86(j)
00010baa 68 ca 0c PUSH s_Your_file_contains_errors_00010cca = "Your file contains errors"
01 00
00010baf e8 74 14 CALL <EXTERNAL>::puts int puts(char * __s)
00 00
00010bb4 83 c4 04 ADD ESP,0x4
00010bb7 eb 0d JMP LAB_00010bc6
LAB_00010bb9 XREF[1]: 00010b8a(j)
00010bb9 68 e4 0c PUSH s_I_can't_tell_if_your_file_is_XHT_00010ce4 = "I can't tell if your file is
01 00
00010bbe e8 65 14 CALL <EXTERNAL>::puts int puts(char * __s)
00 00
00010bc3 83 c4 04 ADD ESP,0x4
LAB_00010bc6 XREF[3]: 00010b99(j), 00010ba8(j),
00010bb7(j)
00010bc6 b8 00 00 MOV EAX,0x0
00 00
LAB_00010bcb XREF[6]: 000109d8(j), 00010a0b(j),
00010a3a(j), 00010a61(j),
00010a86(j), 00010b0a(j)
00010bcb 8b 5d fc MOV EBX,dword ptr [EBP + local_8]
00010bce c9 LEAVE
00010bcf c3 RET
According to Ghidra, this decompiles to:
int main(int argc,char **argv)
{
__uid_t __euid;
__uid_t __ruid;
__gid_t __egid;
__gid_t __rgid;
int iVar1;
int __fd;
int iVar2;
__pid_t __pid;
ssize_t sVar3;
uint uVar4;
int status;
char c;
int pipefd [2];
pid_t pid;
int fd;
int in;
__euid = geteuid();
__ruid = geteuid();
setreuid(__ruid,__euid);
__egid = getegid();
__rgid = getegid();
setregid(__rgid,__egid);
if (argv[1] == (char *)0x0) {
fwrite("Please specify the file to verify\n",1,0x22,stderr);
iVar1 = 1;
}
else {
iVar1 = open(argv[1],0);
if (iVar1 < 0) {
perror("open");
iVar1 = 2;
}
else {
__fd = open("/dev/null",2);
if (__fd < 0) {
perror("open");
iVar1 = 5;
}
else {
iVar2 = pipe(pipefd);
if (iVar2 < 0) {
perror("pipe");
iVar1 = 3;
}
else {
__pid = fork();
if (__pid < 0) {
perror("fork");
iVar1 = 4;
}
else if (__pid == 0) {
dup2(pipefd[0],0);
dup2(__fd,1);
dup2(__fd,2);
close(pipefd[1]);
close(__fd);
close(iVar1);
execlp("tidy","tidy","-asxml",0);
perror("execlp");
iVar1 = 5;
}
else {
while( true ) {
sVar3 = read(iVar1,&c,1);
if (sVar3 == 0) break;
write(pipefd[1],&c,1);
}
close(pipefd[1]);
close(pipefd[0]);
close(__fd);
close(iVar1);
waitpid(__pid,&status,0);
uVar4 = status >> 8 & 0xff;
if (uVar4 == 1) {
puts("Your file is not completely compliant");
}
else if (uVar4 == 2) {
puts("Your file contains errors");
}
else if (uVar4 == 0) {
puts("OK!");
}
else {
puts("I can\'t tell if your file is XHTML-compliant");
}
iVar1 = 0;
}
}
}
}
}
return iVar1;
}
It appears it is (to summarize) opening the file passed as the first argument using open in read only mode. If successful, it is forking and using the child process to execute tidy to validate the file is valid XHTML.
Nothing about it stands out to me as an obvious vulnerability that I can use here. I've looked into vulnerabilities for the tidy command, but wasn't really able to find anything useful for this.
Any help would be much appreciated!

In regular C code, execlp("tidy","tidy","-asxml",0); is incorrect as execlp() expects a null pointer argument to mark the end of the argument list.
0 is a null pointer when used in a pointer context, which this is not. Yet on architectures where pointers have the same size and passing convention as int, such as 32-bit linux, passing 0 or passing NULL generate the same code, so sloppiness does not get punished.
In 64-bit mode, it would be incorrect to do so but you might get lucky with the x86_64 ABI and a 64-bit 0 value will be passed in this case.
In your own code, avoid such pitfalls and use NULL or (char *)0 as the last argument for execlp(). But on this listing, Ghidra produces code that generates the same assembly code, and in 32-bit mode, passing 0 or (char *)0 produce the same code, so no problem here.
In your context, execlp("tidy","tidy","-asxml",0); shows another problem: it will look for an executable program with the name tidy in the current PATH and run this program as tidy with a command line argument -asxml. Since it changed the effective uid and gid, this is a problem if the program is setuid root because you can create a program named tidy in a directory appearing in the PATH variable before the system directories and this program will be run with the modified rights.
Another potential problem is the program does not check for failure of the system calls setreuid() and setregid(). Although these calls are unlikely to fail for the arguments passed, as documented in the manual pages, it is a grave security error to omit checking for a failure return from setreuid(). In case of failure, the real and effective uid (or gid) is not changed and the process may fork and exec with root privileges.

Related

How to prevent GCC from replacing return instructions with a jump to the last one

With most optimizations disabled (-O0) GCC compiles this function to the below assembly code:
int dummy_function(int x)
{
if (x == 1) {
return 1;
} else if (x == 2) {
return 2;
} else if (x == 3) {
return 3;
} else if (x == 4) {
return 4;
} else if (x == 5) {
return 5;
} else if (x == 6) {
return 6;
} else {
return 10;
}
}
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 89 7d fc mov DWORD PTR [rbp-0x4],edi
7: 83 7d fc 01 cmp DWORD PTR [rbp-0x4],0x1
b: 75 07 jne 14 <test+0x14>
d: b8 01 00 00 00 mov eax,0x1
12: eb 46 jmp 5a <test+0x5a>
14: 83 7d fc 02 cmp DWORD PTR [rbp-0x4],0x2
18: 75 07 jne 21 <test+0x21>
1a: b8 02 00 00 00 mov eax,0x2
1f: eb 39 jmp 5a <test+0x5a>
21: 83 7d fc 03 cmp DWORD PTR [rbp-0x4],0x3
25: 75 07 jne 2e <test+0x2e>
27: b8 03 00 00 00 mov eax,0x3
2c: eb 2c jmp 5a <test+0x5a>
2e: 83 7d fc 04 cmp DWORD PTR [rbp-0x4],0x4
32: 75 07 jne 3b <test+0x3b>
34: b8 04 00 00 00 mov eax,0x4
39: eb 1f jmp 5a <test+0x5a>
3b: 83 7d fc 05 cmp DWORD PTR [rbp-0x4],0x5
3f: 75 07 jne 48 <test+0x48>
41: b8 05 00 00 00 mov eax,0x5
46: eb 12 jmp 5a <test+0x5a>
48: 83 7d fc 06 cmp DWORD PTR [rbp-0x4],0x6
4c: 75 07 jne 55 <test+0x55>
4e: b8 06 00 00 00 mov eax,0x6
53: eb 05 jmp 5a <test+0x5a>
55: b8 0a 00 00 00 mov eax,0xa
5a: 5d pop rbp
5b: c3 ret
Return instructions are replaced with a jmp to the last one even when non merge instructions option is used.
How to prevent GCC from doing this optimization ?

'if' test condition in c - does it evaluate?

When calling a function in the test portion of an if statement in c, does it evaluate exactly as if you had called it normally? As in, will all the effects besides the return value evaluate and persist?
For example, if I want to include an error check when calling fseek, can I write
if( fseek(file, 0, SEEK_END) ) {fprintf(stderr, "File too long")};
and be functionally the same as:
long int i = fseek(file, 0, SEEK_END);
if( i ) {fprintf(stderr, "File too long")};
?
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#The-if-Statement
https://www.gnu.org/software/libc/manual/html_node/File-Positioning.html
Yes, this is exactly the same. This only difference is you won't be able to use again the result of the operation executed in the if statement.
In both cases the operation is being executed BEFORE the condition (comparison) happens. To illustrate this, we can see what is the result of the two different cases in machine code. Please do note that the output machine code will vary depending of the OS and compiler.
Source file 'a.c':
#include <stdio.h>
int
main(void)
{
FILE *f = fopen("testfile", "r");
long int i = fseek(f, 0, SEEK_END);
if (i)
fprintf(stderr, "Error\n");
return 0;
}
$ gcc -O1 a.c -o a
Source file 'b.c':
#include <stdio.h>
int
main(void)
{
FILE *f = fopen("testfile", "r");
if (fseek(f, 0, SEEK_END))
fprintf(stderr, "Error\n");
return 0;
}
$ gcc -O1 b.c -o b
You will note that for both cases I used the option '-O1' which allows the compiler to introduce small optimizations, this is mostly to make the machine code a little cleaner as without optimization the compiler converts "literally" to machine code.
$ objdump -Mintel -D a |grep -i main -A20
0000000000001189 <main>:
1189: f3 0f 1e fa endbr64
118d: 48 83 ec 08 sub rsp,0x8
1191: 48 8d 35 6c 0e 00 00 lea rsi,[rip+0xe6c] # 2004 <_IO_stdin_used+0x4>
1198: 48 8d 3d 67 0e 00 00 lea rdi,[rip+0xe67] # 2006 <_IO_stdin_used+0x6>
119f: e8 dc fe ff ff call 1080 <fopen#plt>
# Interesting part
11a4: 48 89 c7 mov rdi,rax # Sets return of fopen as param 1
11a7: ba 02 00 00 00 mov edx,0x2 # Sets Ox2 (SEEK_END) as param 3
11ac: be 00 00 00 00 mov esi,0x0 # Sets 0 as param 2
11b1: e8 ba fe ff ff call 1070 <fseek#plt> # Call to FSEEK being made and stored in register
11b6: 85 c0 test eax,eax # Comparison being made
11b8: 75 0a jne 11c4 <main+0x3b> # Comparison jumping
# End of interesting part
11ba: b8 00 00 00 00 mov eax,0x0
11bf: 48 83 c4 08 add rsp,0x8
11c3: c3 ret
11c4: 48 8b 0d 55 2e 00 00 mov rcx,QWORD PTR [rip+0x2e55] # 4020 <stderr##GLIBC_2.2.5>
11cb: ba 06 00 00 00 mov edx,0x6
11d0: be 01 00 00 00 mov esi,0x1
11d5: 48 8d 3d 33 0e 00 00 lea rdi,[rip+0xe33] # 200f <_IO_stdin_used+0xf>
11dc: e8 af fe ff ff call 1090 <fwrite#plt>
11e1: eb d7 jmp 11ba <main+0x31>
11e3: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
11ea: 00 00 00
11ed: 0f 1f 00 nop DWORD PTR [rax]
Objdumping on binary 'b' yields an almost identical same machine code result. To sum it up, whatever you put in your if statement is evaluated and will yield a beind-the-scene equivalent result whether or not you assign it a variable first.
Edit:
For reference, this is the output of $ objdump -Mintel -D b |grep -i main -A20:
0000000000001189 <main>:
1189: f3 0f 1e fa endbr64
118d: 48 83 ec 08 sub rsp,0x8
1191: 48 8d 35 6c 0e 00 00 lea rsi,[rip+0xe6c] # 2004 <_IO_stdin_used+0x4>
1198: 48 8d 3d 67 0e 00 00 lea rdi,[rip+0xe67] # 2006 <_IO_stdin_used+0x6>
119f: e8 dc fe ff ff call 1080 <fopen#plt>
# Interesting Part
11a4: 48 89 c7 mov rdi,rax
11a7: ba 02 00 00 00 mov edx,0x2
11ac: be 00 00 00 00 mov esi,0x0
11b1: e8 ba fe ff ff call 1070 <fseek#plt>
11b6: 85 c0 test eax,eax
11b8: 75 0a jne 11c4 <main+0x3b>
# End of interesting part
11ba: b8 00 00 00 00 mov eax,0x0
11bf: 48 83 c4 08 add rsp,0x8
11c3: c3 ret
11c4: 48 8b 0d 55 2e 00 00 mov rcx,QWORD PTR [rip+0x2e55] # 4020 <stderr##GLIBC_2.2.5>
11cb: ba 06 00 00 00 mov edx,0x6
11d0: be 01 00 00 00 mov esi,0x1
11d5: 48 8d 3d 33 0e 00 00 lea rdi,[rip+0xe33] # 200f <_IO_stdin_used+0xf>
11dc: e8 af fe ff ff call 1090 <fwrite#plt>
11e1: eb d7 jmp 11ba <main+0x31>
11e3: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
11ea: 00 00 00
11ed: 0f 1f 00 nop DWORD PTR [rax]
The short answer is yes (as in your trivial example), the long answer is maybe.
When the logical expression (any) is more complex the C language evaluates it until the result of the whole expression is fully determined. The remaining operations are not evaluated.
Examples:
int x = 0;
if(x && foo()) {}
the foo will not be called because x is false - and then the whole operation is false.
int x = 1;
if(x && foo()) {}
the foo will be called because x is true and the second part of the expression is needed to get the result.
It is called Short circuit evaluation and all logical expressions in C are evaluated this way.

LD_PRELOAD a function with enums and struct

I am trying to LD_PRELOAD a function with declaration like
// header1.h
typedef enum { ... } enum1;
// header2.h
typedef enum { ... } enum2;
typedef struct { ... } Structure1;
enum1 foo(Structure1* str, enum2 val);
Is it possible to use say unsiged int instead of the enums and void* instead of the Structure1*.
I tried a simple code like this, but doesn't seem to work. Would it be because of type mismatch?
#define _GNU_SOURCE
#include <stdio.h>
#include <stdarg.h>
#include <dlfcn.h>
typedef unsigned int (*foo_t)(void* ptr, unsigned int e2);
unsigned int foo(void* handle, unsigned int e2)
{
printf ("foo\n");
foo_t foo_f = (foo_t) dlsym(RTLD_NEXT, "foo");
unsigned int result = foo_f(ptr, option);
return result;
}
EDIT :
To get to the actual use case,
I am trying to load
CURLcode Curl_setopt(struct Curl_easy *data, CURLoption option,
va_list param)
from here https://github.com/curl/curl/blob/curl-7_55_1/lib/url.c
but when i do nm, it doesnt seem to find this function
$ nm -D /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0 | grep setopt
000000000002fc80 T curl_easy_setopt
0000000000037ac0 T curl_multi_setopt
000000000003ad60 T curl_share_setopt
I tried objdump of curl_easy_setopt which calls Curl_setopt, but there is no call to Curl_setopt here
objdump -D -S -C /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0 --start-address 0x02fc80 --stop-address 0x02fd36
/usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0: file format elf64-x86-64
Disassembly of section .text:
000000000002fc80 <curl_easy_setopt##CURL_OPENSSL_3>:
2fc80: 48 81 ec d8 00 00 00 sub $0xd8,%rsp
2fc87: 84 c0 test %al,%al
2fc89: 48 89 54 24 30 mov %rdx,0x30(%rsp)
2fc8e: 48 89 4c 24 38 mov %rcx,0x38(%rsp)
2fc93: 4c 89 44 24 40 mov %r8,0x40(%rsp)
2fc98: 4c 89 4c 24 48 mov %r9,0x48(%rsp)
2fc9d: 74 37 je 2fcd6 <curl_easy_setopt##CURL_OPENSSL_3+0x56>
2fc9f: 0f 29 44 24 50 movaps %xmm0,0x50(%rsp)
2fca4: 0f 29 4c 24 60 movaps %xmm1,0x60(%rsp)
2fca9: 0f 29 54 24 70 movaps %xmm2,0x70(%rsp)
2fcae: 0f 29 9c 24 80 00 00 movaps %xmm3,0x80(%rsp)
2fcb5: 00
2fcb6: 0f 29 a4 24 90 00 00 movaps %xmm4,0x90(%rsp)
2fcbd: 00
2fcbe: 0f 29 ac 24 a0 00 00 movaps %xmm5,0xa0(%rsp)
2fcc5: 00
2fcc6: 0f 29 b4 24 b0 00 00 movaps %xmm6,0xb0(%rsp)
2fccd: 00
2fcce: 0f 29 bc 24 c0 00 00 movaps %xmm7,0xc0(%rsp)
2fcd5: 00
2fcd6: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
2fcdd: 00 00
2fcdf: 48 89 44 24 18 mov %rax,0x18(%rsp)
2fce4: 31 c0 xor %eax,%eax
2fce6: 48 85 ff test %rdi,%rdi
2fce9: b8 2b 00 00 00 mov $0x2b,%eax
2fcee: 74 2e je 2fd1e <curl_easy_setopt##CURL_OPENSSL_3+0x9e>
2fcf0: 48 8d 84 24 e0 00 00 lea 0xe0(%rsp),%rax
2fcf7: 00
2fcf8: 48 89 e2 mov %rsp,%rdx
2fcfb: c7 04 24 10 00 00 00 movl $0x10,(%rsp)
2fd02: c7 44 24 04 30 00 00 movl $0x30,0x4(%rsp)
2fd09: 00
2fd0a: 48 89 44 24 08 mov %rax,0x8(%rsp)
2fd0f: 48 8d 44 24 20 lea 0x20(%rsp),%rax
2fd14: 48 89 44 24 10 mov %rax,0x10(%rsp)
2fd19: e8 e2 e9 fe ff callq 1e700 <curl_formget##CURL_OPENSSL_3+0xf2e0>
2fd1e: 48 8b 4c 24 18 mov 0x18(%rsp),%rcx
2fd23: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
2fd2a: 00 00
2fd2c: 75 08 jne 2fd36 <curl_easy_setopt##CURL_OPENSSL_3+0xb6>
2fd2e: 48 81 c4 d8 00 00 00 add $0xd8,%rsp
2fd35: c3 retq
Curl_setopt() is not an externally provided symbol so you can't LD_PRELOAD it. Consider replacing curl_easy_setopt instead, which is the public and always accessible symbol.
As a second reason, the function Curl_setopt() doesn't even exist in more recent libcurls.

Editing ELF binary call instruction

I am playing around with manipulating a binary's call functions. I have the below code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void myfunc2(char *str2, char *str1) {
// enter code here
}
void myfunc(char *str2, char *str1)
{
memcpy(str2 + strlen(str2), str1, strlen(str1));
}
int main(int argc, char **argv)
{
char str1[4] = "tim";
char str2[10] = "hello ";
myfunc((char *)&str2, (char *)&str1);
printf("%s\n", str2);
myfunc2((char *)&str2, (char *)&str1);
printf("%s\n", str2);
return 0;
}
void myfunc2(char *str2, char *str1)
{
memcpy(str2, str1, strlen(str1));
}
I have compiled the binary and using readelf or objdump I can see that my two functions reside at:
46: 000000000040072c 52 FUNC GLOBAL DEFAULT 13 myfunc2**
54: 000000000040064d 77 FUNC GLOBAL DEFAULT 13 myfunc**
Using the command objdump -D test (my binaries name), I can see that main has two callq functions. I tried to edit the first one to point to myfunc2 using the above address 72c, but that does not work; causes the binary to fail.
000000000040069a <main>:
40069a: 55 push %rbp
40069b: 48 89 e5 mov %rsp,%rbp
40069e: 48 83 ec 40 sub $0x40,%rsp
4006a2: 89 7d cc mov %edi,-0x34(%rbp)
4006a5: 48 89 75 c0 mov %rsi,-0x40(%rbp)
4006a9: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
4006b0: 00 00
4006b2: 48 89 45 f8 mov %rax,-0x8(%rbp)
4006b6: 31 c0 xor %eax,%eax
4006b8: c7 45 d0 74 69 6d 00 movl $0x6d6974,-0x30(%rbp)
4006bf: 48 b8 68 65 6c 6c 6f movabs $0x206f6c6c6568,%rax
4006c6: 20 00 00
4006c9: 48 89 45 e0 mov %rax,-0x20(%rbp)
4006cd: 66 c7 45 e8 00 00 movw $0x0,-0x18(%rbp)
4006d3: 48 8d 55 d0 lea -0x30(%rbp),%rdx
4006d7: 48 8d 45 e0 lea -0x20(%rbp),%rax
4006db: 48 89 d6 mov %rdx,%rsi
4006de: 48 89 c7 mov %rax,%rdi
4006e1: e8 67 ff ff ff callq 40064d <myfunc>
4006e6: 48 8d 45 e0 lea -0x20(%rbp),%rax
4006ea: 48 89 c7 mov %rax,%rdi
4006ed: e8 0e fe ff ff callq 400500 <puts#plt>
4006f2: 48 8d 55 d0 lea -0x30(%rbp),%rdx
4006f6: 48 8d 45 e0 lea -0x20(%rbp),%rax
4006fa: 48 89 d6 mov %rdx,%rsi
4006fd: 48 89 c7 mov %rax,%rdi
400700: e8 27 00 00 00 callq 40072c <myfunc2>
400705: 48 8d 45 e0 lea -0x20(%rbp),%rax
400709: 48 89 c7 mov %rax,%rdi
40070c: e8 ef fd ff ff callq 400500 <puts#plt>
400711: b8 00 00 00 00 mov $0x0,%eax
400716: 48 8b 4d f8 mov -0x8(%rbp),%rcx
40071a: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
400721: 00 00
400723: 74 05 je 40072a <main+0x90>
400725: e8 f6 fd ff ff callq 400520 <__stack_chk_fail#plt>
40072a: c9 leaveq
40072b: c3 retq
I suspect I need to do something with calculating the address information through relative location or using the lea/mov instructions.
Any assistance to learn how to modify the call function would be greatly appreciated - please no pointers on editing strings like the howtos all over most of the internet...
In order to rewrite the address, you have to know the exact way the callq instructions are encoded.
Let's take the disassembly output of the first call:
4006e1: e8 67 ff ff ff callq 40064d <myfunc>
4006e6: ...
You can clearly see that the instruction is encoded with 5 bytes. The e8 byte is the instruction opcode, and 67 ff ff ff is the address to jump to. At this point, one would ask the question, what has 67 ff ff ff to do with 0x40064d?
Well, the answer is that e8 encodes a so-called "relative call" and the jump is performed relative to the location of the next instruction. You have to calculate the distance between 4006e6 and the called function in order to rewrite the address. Had the call been absolute (ff), you could just put the function address in these 4 bytes.
To prove that this is the case, consider the following arithmetic:
0x004006e6 + 0xffffff67 == 0x10040064d

Why does this code prevent gcc & llvm from tail-call optimization?

I have tried the following code on gcc 4.4.5 on Linux and gcc-llvm on Mac OSX(Xcode 4.2.1) and this. The below are the source and the generated disassembly of the relevant functions. (Added: compiled with gcc -O2 main.c)
#include <stdio.h>
__attribute__((noinline))
static void g(long num)
{
long m, n;
printf("%p %ld\n", &m, n);
return g(num-1);
}
__attribute__((noinline))
static void h(long num)
{
long m, n;
printf("%ld %ld\n", m, n);
return h(num-1);
}
__attribute__((noinline))
static void f(long * num)
{
scanf("%ld", num);
g(*num);
h(*num);
return f(num);
}
int main(void)
{
printf("int:%lu long:%lu unsigned:%lu\n", sizeof(int), sizeof(long), sizeof(unsigned));
long num;
f(&num);
return 0;
}
08048430 <g>:
8048430: 55 push %ebp
8048431: 89 e5 mov %esp,%ebp
8048433: 53 push %ebx
8048434: 89 c3 mov %eax,%ebx
8048436: 83 ec 24 sub $0x24,%esp
8048439: 8d 45 f4 lea -0xc(%ebp),%eax
804843c: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
8048443: 00
8048444: 89 44 24 04 mov %eax,0x4(%esp)
8048448: c7 04 24 d0 85 04 08 movl $0x80485d0,(%esp)
804844f: e8 f0 fe ff ff call 8048344 <printf#plt>
8048454: 8d 43 ff lea -0x1(%ebx),%eax
8048457: e8 d4 ff ff ff call 8048430 <g>
804845c: 83 c4 24 add $0x24,%esp
804845f: 5b pop %ebx
8048460: 5d pop %ebp
8048461: c3 ret
8048462: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
8048469: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
08048470 <h>:
8048470: 55 push %ebp
8048471: 89 e5 mov %esp,%ebp
8048473: 83 ec 18 sub $0x18,%esp
8048476: 66 90 xchg %ax,%ax
8048478: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
804847f: 00
8048480: c7 44 24 04 00 00 00 movl $0x0,0x4(%esp)
8048487: 00
8048488: c7 04 24 d8 85 04 08 movl $0x80485d8,(%esp)
804848f: e8 b0 fe ff ff call 8048344 <printf#plt>
8048494: eb e2 jmp 8048478 <h+0x8>
8048496: 8d 76 00 lea 0x0(%esi),%esi
8048499: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
080484a0 <f>:
80484a0: 55 push %ebp
80484a1: 89 e5 mov %esp,%ebp
80484a3: 53 push %ebx
80484a4: 89 c3 mov %eax,%ebx
80484a6: 83 ec 14 sub $0x14,%esp
80484a9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
80484b0: 89 5c 24 04 mov %ebx,0x4(%esp)
80484b4: c7 04 24 e1 85 04 08 movl $0x80485e1,(%esp)
80484bb: e8 94 fe ff ff call 8048354 <__isoc99_scanf#plt>
80484c0: 8b 03 mov (%ebx),%eax
80484c2: e8 69 ff ff ff call 8048430 <g>
80484c7: 8b 03 mov (%ebx),%eax
80484c9: e8 a2 ff ff ff call 8048470 <h>
80484ce: eb e0 jmp 80484b0 <f+0x10>
We can see that g() and h() are mostly identical except the & (address of) operator beside the argument m of printf()(and the irrelevant %ld and %p).
However, h() is tail-call optimized and g() is not. Why?
In g(), you're taking the address of a local variable and passing it to a function. A "sufficiently smart compiler" should realize that printf does not store that pointer. Instead, gcc and llvm assume that printf might store the pointer somewhere, so the call frame containing m might need to be "live" further down in the recursion. Therefore, no TCO.
It's the & that does it. It tells the compiler that m should be stored on the stack. Even though it is passed to printf, the compiler has to assume that it might be accessed by somebody else and thus must the cleaned from the stack after the call to g.
In this particular case, as printf is known by the compiler (and it knows that it does not save pointers), it could probably be taught to perform this optimization.
For more info on this, look up 'escape anlysis'.

Resources