Is printf a static function in C? - c

I'm pretty new to C, and I know that static functions can only be used within the same object file.
Something that still confuses me though, is how if I hover over a call to printf in my IDE it tells me that printf is a static function, when I can perfectly use printf in multiple object files without any problems?
Why is that?
Edit: I'm using Visual Studio Code the library is stdio.h and compiling using GCC
#include <stdio.h>
int main()
{
printf("Hello, world!");
return 0;
}
Hovering over printf would give that hint
Edit 2: if static inline functions are different from inline functions how so? I don't see how making a function inline would change the fact that it's only accessible from the same translation unit
Edit 3: per request, here's the definition of printf in the stdio.h header
__mingw_ovr
__attribute__((__format__ (gnu_printf, 1, 2))) __MINGW_ATTRIB_NONNULL(1)
int printf (const char *__format, ...)
{
int __retval;
__builtin_va_list __local_argv; __builtin_va_start( __local_argv, __format );
__retval = __mingw_vfprintf( stdout, __format, __local_argv );
__builtin_va_end( __local_argv );
return __retval;
}
screenshot of the printf definition in my IDE

Thanks for all the updates. Your IDE is showing you implementation details of your C library that you're not supposed to have to worry about. That is arguably a bug in your IDE but there's nothing wrong with the C library.
You are correct to think that a function declared static is visible only within the translation unit where it is defined. And that is true whether or not the function is also marked inline. However, this particular static inline function is defined inside stdio.h, so every translation unit that includes stdio.h can see it. (There is a separate copy of the inline in each TU. This is technically a conformance violation since, as chux surmises, it means that &printf in one translation unit will not compare equal to &printf in another. However, this is very unlikely to cause problems for a program that isn't an ISO C conformance tester.)
As "Adrian Mole" surmised in the comments on the question, the purpose of this inline function is, more or less, to rewrite
printf("%s %d %p", string, integer, pointer);
into
fprintf(stdout, "%s %d %p", string, integer, pointer);
(All the stuff with __builtin_va_start and __mingw_vfprintf is because of limitations in how variadic functions work in C. The effect is the same as what I showed above, but the generated assembly language will not be nearly as tidy.)
Update 2022-12-01: It's worse than "the generated assembly language will not be nearly as tidy". I experimented with all of the x86 compilers supported by godbolt.org and none of them will inline a function that takes a variable number of arguments, even if you try to force it. Most silently ignore the force-inlining directive; GCC gets one bonus point for actually saying it refuses to do this:
test.c:4:50: error: function ‘xprintf’ can never be inlined
because it uses variable argument lists
In consequence, every program compiled against this version of MinGW libc, in which printf is called from more than one .c file, will have multiple copies of the following glob of assembly embedded in its binary. This is bad just because of cache pollution; it would be better to have a single copy in the C library proper -- which is exactly what printf normally is.
printf:
mov QWORD PTR [rsp+8], rcx
mov QWORD PTR [rsp+16], rdx
mov QWORD PTR [rsp+24], r8
mov QWORD PTR [rsp+32], r9
push rbx
push rsi
push rdi
sub rsp, 48
mov rdi, rcx
lea rsi, QWORD PTR f$[rsp+8]
mov ecx, 1
call __acrt_iob_func
mov rbx, rax
call __local_stdio_printf_options
xor r9d, r9d
mov QWORD PTR [rsp+32], rsi
mov r8, rdi
mov rdx, rbx
mov rcx, QWORD PTR [rax]
call __stdio_common_vfprintf
add rsp, 48
pop rdi
pop rsi
pop rbx
ret 0
(Probably not exactly this assembly, this is what godbolt's "x64 MSVC 19.latest" produces with optimization option /O2, godbolt doesn't seem to have any MinGW compilers. But you get the idea.)

__mingw_ovr is normally defined as
#define __mingw_ovr static __attribute__ ((__unused__)) __inline__ __cdecl
It appears that this definition of printf is a violation of the C standard. The standard says
7.1.2 Standard headers
6 Any declaration of a library function shall have external linkage

Related

Can ASLR randomization be different per function?

I have the following code snippet:
#include <inttypes.h>
#include <stdio.h>
uint64_t
esp_func(void)
{
__asm__("movl %esp, %eax");
}
int
main()
{
uint32_t esp = 0;
__asm__("\t movl %%esp,%0" : "=r"(esp));
printf("esp: 0x%08x\n", esp);
printf("esp: 0x%08lx\n", esp_func());
return 0;
}
Which prints the following upon multiple executions:
❯ clang -g esp.c && ./a.out
esp: 0xbd3b7670
esp: 0x7f8c1c2c5140
❯ clang -g esp.c && ./a.out
esp: 0x403c9040
esp: 0x7f9ee8bd8140
❯ clang -g esp.c && ./a.out
esp: 0xb59b70f0
esp: 0x7fe301f8c140
❯ clang -g esp.c && ./a.out
esp: 0x6efa4110
esp: 0x7fd95941f140
❯ clang -g esp.c && ./a.out
esp: 0x144e72b0
esp: 0x7f246d4ef140
esp_func shows that ASLR is active with 28 bits of entropy, which makes sense on my modern Linux kernel.
What doesn't make sense is the first value: why is it drastically different?
I took a look at the assembly and it looks weird...
// From main
0x00001150 55 push rbp
0x00001151 4889e5 mov rbp, rsp
0x00001154 4883ec10 sub rsp, 0x10
0x00001158 c745fc000000. mov dword [rbp-0x4], 0
0x0000115f c745f8000000. mov dword [rbp-0x8], 0
0x00001166 89e0 mov eax, esp ; Move esp to eax
0x00001168 8945f8 mov dword [rbp-0x8], eax ; Assign eax to my variable `esp`
0x0000116b 8b75f8 mov esi, dword [rbp-0x8]
0x0000116e 488d3d8f0e00. lea rdi, [0x00002004]
0x00001175 b000 mov al, 0
0x00001177 e8b4feffff call sym.imp.printf ; For whatever reason, the value in [rbp-0x8]
; is assigned here. Why?
// From esp_func
0x00001140 55 push rbp
0x00001141 4889e5 mov rbp, rsp
0x00001144 89e0 mov eax, esp ; Move esp to eax (same instruction as above)
0x00001146 488b45f8 mov rax, qword [rbp-0x8] ; This changes everything. What is this?
0x0000114a 5d pop rbp
0x0000114b c3 ret
0x0000114c 0f1f4000 nop dword [rax]
So my question is, what is in [rbp-0x8], how did it get there, and why are the two values different?
No, stack ASLR happens once at program startup. Relative adjustments to RSP between functions are fixed at compile time, and are just the small constants to make space for a function's local vars. (C99 variable-length arrays and alloca do runtime-variable adjustments to RSP, but not random.)
Your program contains Undefined Behaviour and isn't actually printing RSP; instead some stack address left in a register by the previous printf call (which appears to be a stack address, so its high bits do vary with ASLR). It tells you nothing about stack-pointer differences between functions, just how not to use GNU C inline asm.
The first value is printing the current ESP correctly, but that's only the low 32 bits of the 64-bit RSP.
Falling off the end of a non-void function is not safe, and using the return value is Undefined Behaviour. Any caller that uses the return value of esp_func() necessarily would trigger UB, so the compiler is free to leave whatever it wants in RAX.
If you want to write mov %rsp, %rax / ret, then write that function in pure asm, or mov to an "=r"(tmp) local variable. Using GNU C inline asm to modify RAX without telling the compiler about it doesn't change anything; the compiler still sees this as a function with no return value.
MSVC inline asm is different: it is apparently supported to use _asm{ mov eax, 123 } or something and then fall off the end of a non-void function, and MSVC will respect that as the function return value even when inlining. GNU C inline asm doesn't need silly hacks like that: if you want your asm to interact with C values, use Extended asm with an output constraint like you're doing in main. Remember that GNU C inline asm is not parsed by the compiler, just emit the template string as part of the compiler's asm output to be assembled.
I don't know exactly why clang is reloading a return value from the stack, but that's just an artifact of clang internals and how it does code-gen with optimization disabled. But it's allowed to do this because of the undefined behaviour. It is a non-void function, so it needs to have a return value. The simplest thing would be to just emit a ret, and is what some compilers happen to do with optimization enabled, but even that doesn't fix the problem because of inter-procedural optimization.
It's actually Undefined Behaviour in C to use the return value of a function that didn't return one. This applies at the C level; using inline asm that modifies a register without telling the compiler about it doesn't change anything as far as the compiler is concerned. Therefore your program as a whole contains UB, because it passes the result to printf. That's why the compiler is allowed to compile this way: your code was already broken. In practice it's just returning some garbage from stack memory.
TL:DR: this is not a valid way to emit mov %rsp, %rax / ret as the asm definition for a function.
(C++ strengthens this to it being UB to fall off the end in the first place, but in C it's legal as long as the caller doesn't use the return value. If you compile the same source as C++ with optimization, g++ doesn't even emit a ret instruction after your inline asm template. Probably this is to support C's default-int return type if you declare a function without a return type.)
This UB is also why your modified version from comments (with the printf format strings fixed), compiled with optimization enabled (https://godbolt.org/z/sE7e84) prints "surprisingly" different "RSP" values: the 2nd one isn't using RSP at all.
#include <inttypes.h>
#include <stdio.h>
uint64_t __attribute__((noinline)) rsp_func(void)
{
__asm__("movq %rsp, %rax");
} // UB if return value used
int main()
{
uint64_t rsp = 0;
__asm__("\t movq %%rsp,%0" : "=r"(rsp));
printf("rsp: 0x%08lx\n", rsp);
printf("rsp: 0x%08lx\n", rsp_func()); // UB here
return 0;
}
Output example:
Compiler stderr
<source>:7:1: warning: non-void function does not return a value [-Wreturn-type]
}
^
1 warning generated.
Program returned: 0
Program stdout
rsp: 0x7fff5c472f30
rsp: 0x7f4b811b7170
clang -O3 asm output shows that the compiler-visible UB was a problem. Even though you used noinline, the compiler can still see the function body and try to do inter-procedural optimization. In this case, the UB led it to just give up and not emit a mov %rsp, %rsi between call rsp_func and call printf, so it's printing whatever value the previous printf happened to leave in RSI
# from the Godbolt link
rsp_func: # #rsp_func
mov rax, rsp
ret
main: # #main
push rax
mov rsi, rsp
mov edi, offset .L.str
xor eax, eax
call printf
call rsp_func # return value ignored because of UB.
mov edi, offset .L.str
xor eax, eax
call printf # printf("0x%08lx\n", garbage in RSI left from last printf)
xor eax, eax
pop rcx
ret
.L.str:
.asciz "rsp: 0x%08lx\n"
GNU C Basic asm (without constraints) is not useful for anything (except the body of a __attribute__((naked)) function).
Don't assume the compiler will do what you expect when there is UB visible to it at compile time. (When UB isn't visible at compile time, the compiler has to make code that would work for some callers or callees, and you get the asm you expected. But compile-time-visible UB means all bets are off.)

Ambiguous behaviour of strcmp()

Please note that I have checked the relevant questions to this title, but from my point of view they are not related to this question.
Initially I thought that program1 and program2 would give me the same result.
//Program 1
char *a = "abcd";
char *b = "efgh";
printf("%d", strcmp(a,b));
//Output: -4
//Program 2
printf("%d", strcmp("abcd", "efgh"));
//Output: -1
Only difference that I can spot is that in the program2 I have passed string literal, while in program I've passed char * as the argument of the strcmp() function.
Why there is a difference between the behaviour of these seemingly same program?
Platform: Linux mint
compiler: g++
Edit: Actually the program1 always prints the difference of ascii code of the first mismatched characters, but the program2 print -1 if the ascii code of the first mismatched character in string2 is greater than that of string1 and vice versa.
This is your C code:
int x1()
{
char *a = "abcd";
char *b = "efgh";
printf("%d", strcmp(a,b));
}
int x2()
{
printf("%d", strcmp("abcd", "efgh"));
}
And this is the generated assembly output for both functions:
.LC0:
.string "abcd"
.LC1:
.string "efgh"
.LC2:
.string "%d"
x1:
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], OFFSET FLAT:.LC0
mov QWORD PTR [rbp-16], OFFSET FLAT:.LC1
mov rdx, QWORD PTR [rbp-16]
mov rax, QWORD PTR [rbp-8]
mov rsi, rdx
mov rdi, rax
call strcmp // the strcmp function is actually called
mov esi, eax
mov edi, OFFSET FLAT:.LC2
mov eax, 0
call printf
nop
leave
ret
x2:
push rbp
mov rbp, rsp
mov esi, -1 // strcmp is never called, the compiler
// knows what the result will be and it just
// uses -1
mov edi, OFFSET FLAT:.LC2
mov eax, 0
call printf
nop
pop rbp
ret
When the compiler sees strcmp("abcd", "efgh") it knows the result beforehand, because it knows that "abcd" comes before "efgh".
But if it sees strcmp(a,b) it does not know and hence generates code that actually calls strcmp.
With another compiler or with different compiler settings things could be different. You really shouldn't care about such details at least at a beginner's level.
It is indeed surprising that strcmp returns 2 different values for these calls, but it is not incompatible with the C Standard:
strcmp() returns a negative value if the first string is lexicographically before the second string. Both -4 and -1 are negative values.
As pointed by others, the code generated for the different calls is different:
the compiler generates a call to the library function in the first program
the compiler is able to determine the result of the comparison and generates an explicit result of -1 for the second case where both arguments are string literals.
In order to perform this compile time evaluation, strcmp must be defined in a subtile way in <string.h> so the compiler can determine that the program refers to the C library's implementation and not an alternative that might behave differently. Tracing the corresponding prototype in recent GNU libc include files is a bit difficult with a number of nested macros eventually leading to a hidden prototype.
Note that more recent versions of both gcc and clang will perform the optimisation in both cases as can be tested on Godbolt Compiler Explorer, but neither combines this optmisation with that of printf to generate the even more compact code puts("-1");. They seem to convert printf to puts only for string literal formats without arguments.
I believe (would need to see (and interpret) machine code) one version works without calling code in the library (as if you wrote printf("%d", -1);).

Assembly: Purpose of loading the effective address before a call to a function?

Source C Code:
int main()
{
int i;
for(i=0, i < 10; i++)
{
printf("Hello World!\n");
}
}
Dump of Intel syntax x86 assembler code for function main:
1. 0x000055555555463a <+0>: push rbp
2. 0x000055555555463b <+1>: mov rbp,rsp
3. 0x000055555555463e <+4>: sub rsp,0x10
4. 0x0000555555554642 <+8>: mov DWORD PTR [rbp-0x4],0x0
5. 0x0000555555554649 <+15>: jmp 0x55555555465b <main+33>
6. 0x000055555555464b <+17>: lea rdi,[rip+0xa2] # 0x5555555546f4
7. 0x0000555555554652 <+24>: call 0x555555554510 <puts#plt>
8. 0x0000555555554657 <+29>: add DWORD PTR [rbp-0x4],0x1
9. 0x000055555555465b <+33>: cmp DWORD PTR [rbp-0x4],0x9
10. 0x000055555555465f <+37>: jle 0x55555555464b <main+17>
11. 0x0000555555554661 <+39>: mov eax,0x0
12. 0x0000555555554666 <+44>: leave
13. 0x0000555555554667 <+45>: ret
I'm currently working through "Hacking, The Art of Exploitation 2nd Edition by Jon Erickson", and I'm just starting to tackle assembly.
I have a few questions about the translation of the provided C code to Assembly, but I am mainly wondering about my first question.
1st Question: What is the purpose of line 6? (lea rdi,[rip+0xa2]).
My current working theory, is that this is used to save where the next instructions will jump to in order to track what is going on. I believe this line correlates with the printf function in the source C code.
So essentially, its loading the effective address of rip+0xa2 (0x5555555546f4) into the register rdi, to simply track where it will jump to for the printf function?
2nd Question: What is the purpose of line 11? (mov eax,0x0?)
I do not see a prior use of the register, EAX and am not sure why it needs to be set to 0.
The LEA puts a pointer to the string literal into a register, as the first arg for puts. The search term you're looking for is "calling convention" and/or ABI. (And also RIP-relative addressing). Why is the address of static variables relative to the Instruction Pointer?
The small offset between code and data (only +0xa2) is because the .rodata section gets linked into the same ELF segment as .text, and your program is tiny. (Newer gcc + ld versions will put it in a separate page so it can be non-executable.)
The compiler can't use a shorter more efficient mov edi, address in position-independent code in your Linux PIE executable. It would do that with gcc -fno-pie -no-pie
mov eax,0 implements the implicit return 0 at the end of main that C99 and C++ guarantee. EAX is the return-value register in all calling conventions.
If you don't use gcc -O2 or higher, you won't get peephole optimizations like xor-zeroing (xor eax,eax).
This:
lea rdi,[rip+0xa2]
Is a typical position independent LEA, putting the string address into a register (instead of loading from that memory address).
Your executable is position independent, meaning that it can be loaded at runtime at any address. Therefore, the real address of the argument to be passed to puts() needs to be calculated at runtime every single time, since the base address of the program could be different each time. Also, puts() is used instead of printf() because the compiler optimized the call since there is no need to format anything.
In this case, the binary was most probably loaded with the base address 0x555555554000. The string to use is stored in your binary at offset 0x6f4. Since the next instruction is at offset 0x652, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x6f4 - 0x652) = rip + 0xa2, which is what you see above. See this answer of mine for another example.
The purpose of:
mov eax,0x0
Is to set the return value of main(). In Intel x86, the calling convention is to return values in the rax register (eax if the value is 32 bits, which is true in this case since main returns an int). See the table entry for x86-64 at the end of this page.
Even if you don't add an explicit return statement, main() is a special function, and the compiler will add a default return 0 for you.
If you add some debug data and symbols to the assembly everything will be easier. It is also easier to read the code if you add some optimizations.
There is a very useful tool godbolt and your example https://godbolt.org/z/9sRFmU
On the asm listing there you can clearly see that that lines loads the address of the string literal which will be then printed by the function.
EAX is considered volatile and main by default returns zero and thats the reason why it is zeroed.
The calling convention is explained here: https://en.wikipedia.org/wiki/X86_calling_conventions
Here you have more interesting cases https://godbolt.org/z/M4MeGk

Mysterious printf argument when disassembling c program

So I have this c code:
#include <stdio.h>
int main(void)
{
int a;
int b;
int c;
a=b=c=5;
printf("Hi%d%d%dHi",a,b,c);
}
I compiled it on ubuntu with:
gcc program.c -o program -ggdb -m32 -O2
And then disassembled it with:
objdump -M intel program -d
And in main printf() gets called like this:
mov DWORD PTR [esp+0x10],0x5
mov DWORD PTR [esp+0xc],0x5
mov DWORD PTR [esp+0x8],0x5
mov DWORD PTR [esp+0x4],0x8048500
mov DWORD PTR [esp],0x1
call 8048330 <__printf_chk#plt>
What I am wondering right now is what this means:
mov DWORD PTR [esp],0x1
I know what the first 4 mov instructions are for, but I just can't figure out why a '1' gets pushed onto the stack. Also this mov only occurs when optimization is turned on. Any ideas?
The GNU C library (glibc) will use __printf_chk instead of printf if you (or the the compiler) defines _FORTIFY_SOURCE and optimization is enabled. The _chk version of the function behaves just like the function it replaces except it's supposed to check for stack overflow and maybe validate the arguments. The extra first argument indicates how much checking and validation should occur.
Looking at the actual glibc implmenation it appears that doesn't do any additional stack checking over what the compiler automatic provides (and so shouldn't be necessary) and the validation of arguments is very minimal. It will check that %n only appears on read-only format strings, and checks that if the special %m$ argument specifiers are used that they're used for all arguments without any gaps.

Alloca implementation

How does one implement alloca() using inline x86 assembler in languages like D, C, and C++? I want to create a slightly modified version of it, but first I need to know how the standard version is implemented. Reading the disassembly from compilers doesn't help because they perform so many optimizations, and I just want the canonical form.
Edit: I guess the hard part is that I want this to have normal function call syntax, i.e. using a naked function or something, make it look like the normal alloca().
Edit # 2: Ah, what the heck, you can assume that we're not omitting the frame pointer.
implementing alloca actually requires compiler assistance. A few people here are saying it's as easy as:
sub esp, <size>
which is unfortunately only half of the picture. Yes that would "allocate space on the stack" but there are a couple of gotchas.
if the compiler had emitted code
which references other variables
relative to esp instead of ebp
(typical if you compile with no
frame pointer). Then those
references need to be adjusted. Even with frame pointers, compilers do this sometimes.
more importantly, by definition, space allocated with alloca must be
"freed" when the function exits.
The big one is point #2. Because you need the compiler to emit code to symmetrically add <size> to esp at every exit point of the function.
The most likely case is the compiler offers some intrinsics which allow library writers to ask the compiler for the help needed.
EDIT:
In fact, in glibc (GNU's implementation of libc). The implementation of alloca is simply this:
#ifdef __GNUC__
# define __alloca(size) __builtin_alloca (size)
#endif /* GCC. */
EDIT:
after thinking about it, the minimum I believe would be required would be for the compiler to always use a frame pointer in any functions which uses alloca, regardless of optimization settings. This would allow all locals to be referenced through ebp safely and the frame cleanup would be handled by restoring the frame pointer to esp.
EDIT:
So i did some experimenting with things like this:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#define __alloca(p, N) \
do { \
__asm__ __volatile__( \
"sub %1, %%esp \n" \
"mov %%esp, %0 \n" \
: "=m"(p) \
: "i"(N) \
: "esp"); \
} while(0)
int func() {
char *p;
__alloca(p, 100);
memset(p, 0, 100);
strcpy(p, "hello world\n");
printf("%s\n", p);
}
int main() {
func();
}
which unfortunately does not work correctly. After analyzing the assembly output by gcc. It appears that optimizations get in the way. The problem seems to be that since the compiler's optimizer is entirely unaware of my inline assembly, it has a habit of doing the things in an unexpected order and still referencing things via esp.
Here's the resultant ASM:
8048454: push ebp
8048455: mov ebp,esp
8048457: sub esp,0x28
804845a: sub esp,0x64 ; <- this and the line below are our "alloc"
804845d: mov DWORD PTR [ebp-0x4],esp
8048460: mov eax,DWORD PTR [ebp-0x4]
8048463: mov DWORD PTR [esp+0x8],0x64 ; <- whoops! compiler still referencing via esp
804846b: mov DWORD PTR [esp+0x4],0x0 ; <- whoops! compiler still referencing via esp
8048473: mov DWORD PTR [esp],eax ; <- whoops! compiler still referencing via esp
8048476: call 8048338 <memset#plt>
804847b: mov eax,DWORD PTR [ebp-0x4]
804847e: mov DWORD PTR [esp+0x8],0xd ; <- whoops! compiler still referencing via esp
8048486: mov DWORD PTR [esp+0x4],0x80485a8 ; <- whoops! compiler still referencing via esp
804848e: mov DWORD PTR [esp],eax ; <- whoops! compiler still referencing via esp
8048491: call 8048358 <memcpy#plt>
8048496: mov eax,DWORD PTR [ebp-0x4]
8048499: mov DWORD PTR [esp],eax ; <- whoops! compiler still referencing via esp
804849c: call 8048368 <puts#plt>
80484a1: leave
80484a2: ret
As you can see, it isn't so simple. Unfortunately, I stand by my original assertion that you need compiler assistance.
It would be tricky to do this - in fact, unless you have enough control over the compiler's code generation it cannot be done entirely safely. Your routine would have to manipulate the stack, such that when it returned everything was cleaned, but the stack pointer remained in such a position that the block of memory remained in that place.
The problem is that unless you can inform the compiler that the stack pointer is has been modified across your function call, it may well decide that it can continue to refer to other locals (or whatever) through the stack pointer - but the offsets will be incorrect.
For the D programming language, the source code for alloca() comes with the download. How it works is fairly well commented. For dmd1, it's in /dmd/src/phobos/internal/alloca.d. For dmd2, it's in /dmd/src/druntime/src/compiler/dmd/alloca.d.
The C and C++ standards don't specify that alloca() has to the use the stack, because alloca() isn't in the C or C++ standards (or POSIX for that matter)¹.
A compiler may also implement alloca() using the heap. For example, the ARM RealView (RVCT) compiler's alloca() uses malloc() to allocate the buffer (referenced on their website here), and also causes the compiler to emit code that frees the buffer when the function returns. This doesn't require playing with the stack pointer, but still requires compiler support.
Microsoft Visual C++ has a _malloca() function that uses the heap if there isn't enough room on the stack, but it requires the caller to use _freea(), unlike _alloca(), which does not need/want explicit freeing.
(With C++ destructors at your disposal, you can obviously do the cleanup without compiler support, but you can't declare local variables inside an arbitrary expression so I don't think you could write an alloca() macro that uses RAII. Then again, apparently you can't use alloca() in some expressions (like function parameters) anyway.)
¹ Yes, it's legal to write an alloca() that simply calls system("/usr/games/nethack").
Continuation Passing Style Alloca
Variable-Length Array in pure ISO C++. Proof-of-Concept implementation.
Usage
void foo(unsigned n)
{
cps_alloca<Payload>(n,[](Payload *first,Payload *last)
{
fill(first,last,something);
});
}
Core Idea
template<typename T,unsigned N,typename F>
auto cps_alloca_static(F &&f) -> decltype(f(nullptr,nullptr))
{
T data[N];
return f(&data[0],&data[0]+N);
}
template<typename T,typename F>
auto cps_alloca_dynamic(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
vector<T> data(n);
return f(&data[0],&data[0]+n);
}
template<typename T,typename F>
auto cps_alloca(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
switch(n)
{
case 1: return cps_alloca_static<T,1>(f);
case 2: return cps_alloca_static<T,2>(f);
case 3: return cps_alloca_static<T,3>(f);
case 4: return cps_alloca_static<T,4>(f);
case 0: return f(nullptr,nullptr);
default: return cps_alloca_dynamic<T>(n,f);
}; // mpl::for_each / array / index pack / recursive bsearch / etc variacion
}
LIVE DEMO
cps_alloca on github
alloca is directly implemented in assembly code.
That's because you cannot control stack layout directly from high level languages.
Also note that most implementation will perform some additional optimization like aligning the stack for performance reasons.
The standard way of allocating stack space on X86 looks like this:
sub esp, XXX
Whereas XXX is the number of bytes to allcoate
Edit:
If you want to look at the implementation (and you're using MSVC) see alloca16.asm and chkstk.asm.
The code in the first file basically aligns the desired allocation size to a 16 byte boundary. Code in the 2nd file actually walks all pages which would belong to the new stack area and touches them. This will possibly trigger PAGE_GAURD exceptions which are used by the OS to grow the stack.
You can examine sources of an open-source C compiler, like Open Watcom, and find it yourself
If you can't use c99's Variable Length Arrays, you can use a compound literal cast to a void pointer.
#define ALLOCA(sz) ((void*)((char[sz]){0}))
This also works for -ansi (as a gcc extension) and even when it is a function argument;
some_func(&useful_return, ALLOCA(sizeof(struct useless_return)));
The downside is that when compiled as c++, g++>4.6 will give you an error: taking address of temporary array ... clang and icc don't complain though
Alloca is easy, you just move the stack pointer up; then generate all the read/writes to point to this new block
sub esp, 4
What we want to do is something like that:
void* alloca(size_t size) {
<sp> -= size;
return <sp>;
}
In Assembly (Visual Studio 2017, 64bit) it looks like:
;alloca.asm
_TEXT SEGMENT
PUBLIC alloca
alloca PROC
sub rsp, rcx ;<sp> -= size
mov rax, rsp ;return <sp>;
ret
alloca ENDP
_TEXT ENDS
END
Unfortunately our return pointer is the last item on the stack, and we do not want to overwrite it. Additionally we need to take care for the alignment, ie. round size up to multiple of 8. So we have to do this:
;alloca.asm
_TEXT SEGMENT
PUBLIC alloca
alloca PROC
;round up to multiple of 8
mov rax, rcx
mov rbx, 8
xor rdx, rdx
div rbx
sub rbx, rdx
mov rax, rbx
mov rbx, 8
xor rdx, rdx
div rbx
add rcx, rdx
;increase stack pointer
pop rbx
sub rsp, rcx
mov rax, rsp
push rbx
ret
alloca ENDP
_TEXT ENDS
END
my_alloca: ; void *my_alloca(int size);
MOV EAX, [ESP+4] ; get size
ADD EAX,-4 ; include return address as stack space(4bytes)
SUB ESP,EAX
JMP DWORD [ESP+EAX] ; replace RET(do not pop return address)
I recommend the "enter" instruction. Available on 286 and newer processors (may have been available on the 186 as well, I can't remember offhand, but those weren't widely available anyways).

Resources