Basic cdb: does cdb vary how it notates scope? - c

If I compile:
int *a;
void main(void)
{
*a = 1;
}
and then disassemble main in cdb I get:
pointersproject!main:
00000001`3fd51010 mov rax,qword ptr [pointersproject!a (00000001`3fd632f0)]
00000001`3fd51017 mov dword ptr [rax],1
00000001`3fd5101d xor eax,eax
00000001`3fd5101f ret
So *a is symbolized by pointersproject!a. All good.
However, if I declare the pointer within main:
void main(void)
{
int *a;
a = 1;
}
I see that a is just an offset from the stack pointer (I believe), rather then the human-readable structure I'd expect (like, say pointersproject!main!a):
pointersproject!main:
00000001`3fd51010 sub rsp,18h
00000001`3fd51014 mov rax,qword ptr [rsp]
00000001`3fd51018 mov dword ptr [rax],1
00000001`3fd5101e xor eax,eax
00000001`3fd51020 add rsp,18h
00000001`3fd51024 ret
This is probably as much about my understanding of what the compiler's done as anything else but: can anyone explain why the notation for a isn't what I expect?
(This inspired by musing while looking at x64 Windows Debugging: Practical Foundations by Dmitry Vostokov).

When a variable is defined inside a function, it is an automatic variable unless explicitly declared static. Such variables only live during the execution of the function and are normally allocated in the stack, thus they are deallocated when the function exits. The change you see in the complied code is not due to the change in scope but to the change from static to automatic variable. If you make a static, it will not be allocated in the stack, even if its scope is the function main.

Related

If a C function is called twice will it create a variable declared in the function twice? [duplicate]

This question already has answers here:
String literals: Where do they go?
(8 answers)
Closed 2 years ago.
I have a function written in C which consist of a pointer variable like this
#include<stdio.h>
void print()
{
char *hello="hello world";
fprintf(stdout,"%s",hello);
}
void main()
{
print();
print();
}
if i call the print() function twice, will it allocate the memory for the hello variable twice?
if i call the print() function twice, will it allocate the memory for the hello variable twice?
No, it's a string literal and allocated just once.
You can check that by checking the address:
fprintf(stdout,"%p: %s\n", hello, hello);
Sample output:
0x563b972277c4: hello world
0x563b972277c4: hello world
if i call the print() function twice, will it allocate the memory for the hello variable twice?
Yes. However, hello is a pointer and takes up 8 bytes of stack space on a 64 bit machine. The cost is effectively not measurable. Furthermore, the compiler doesn’t necessarily need to allocate the variable at all, since you never attempt to take its address. The compiler is free to effectively transform your code into:
void print()
{
fprintf(stdout, "%s", "hello world");
}
Meaning that the declaration of hello will not result in any memory allocation at runtime. not For all intents and purposes, having hello as a local variable is cost-free.
By contrast, the zero-terminated string literal "hello world" is only allocated once, in the data segment of the application. The compiler can do this because it knows that C string literals are readonly, so nothing is allowed to modify it. Furthermore, no dynamic memory allocation is performed at runtime. The memory for the string literal is allocated statically, and its lifetime is the lifetime of the application.
Consequently your print function is essentially as cheap as it can get — it does not perform any actual allocations at all.
You can paste the code at compiler explorer to see what happens. From your code, this is the assebler generated:
print(): # #print()
mov rcx, qword ptr [rip + stdout]
mov edi, offset .L.str
mov esi, 11
mov edx, 1
jmp fwrite # TAILCALL
main: # #main
push rax
mov rcx, qword ptr [rip + stdout]
mov edi, offset .L.str
mov esi, 11
mov edx, 1
call fwrite
mov rcx, qword ptr [rip + stdout]
mov edi, offset .L.str
mov esi, 11
mov edx, 1
call fwrite
xor eax, eax
pop rcx
ret
.L.str:
.asciz "hello world"
The important part is at the end:
.L.str:
.asciz "hello world"
The "hello world" is declared only there like a global variable, and is used every time you call the function print.
It is like if you declared it like this:
#include<stdio.h>
const char* hello = "hello world";
void print()
{
fprintf(stdout,"%s",hello);
}
void main()
{
print();
print();
}
In this case, the compiler saw the optimisation and made it. But, I cant say for sure this will always be the case, because it cant depend on the compiler.
hello variable
Automatic variables end their lifes when function exits. So the memory is only allocated on the stack when your are inside the function. It is going to be freed when function exists.
the string literal has program lifetime and is somewhere where literals are stored (usually .rodata segment). This area is filled during the program building and how actually it is represented in the memory depends on the implementation

How does assembly code determine how far down variables are placed on the stack?

I am trying to understand some basic assembly code concepts and am getting stuck on how the assembly code determines where to place things on the stack and how much space to give it.
To start playing around with it, I entered this simple code in godbolt.org's compiler explorer.
int main(int argc, char** argv) {
int num = 1;
num++;
return num;
}
and got this assembly code
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi
mov QWORD PTR [rbp-32], rsi
mov DWORD PTR [rbp-4], 1
add DWORD PTR [rbp-4], 1
mov eax, DWORD PTR [rbp-4]
pop rbp
ret
So a couple questions here:
Shouldn't the parameters have been placed on the stack before the call? Why are argc and argv placed at offset 20 and 32 from the base pointer of the current stack frame? That seems really far down to put them if we only need room for the one local variable num. Is there a reason for all of this extra space?
The local variable is stored at 4 below the base pointer. So if we were visualizing this in the stack and say the base pointer currently pointed at 0x00004000 (just making this up for an example, not sure if that's realistic), then we place the value at 0x00003FFC, right? And an integer is size 4 bytes, so does it take up the memory space from 0x00003FFC downward to 0x00003FF8, or does it take up the memory space from 0x00004000 to 0x00003FFC?
It looks like stack pointer was never moved down to allow room for this local variable. Shouldn't we have done something like sub rsp, 4 to make room for the local int?
And then if I modify this to add more locals to it:
int main(int argc, char** argv) {
int num = 1;
char *str1 = {0};
char *str2 = "some string";
num++;
return num;
}
Then we get
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-36], edi
mov QWORD PTR [rbp-48], rsi
mov DWORD PTR [rbp-4], 1
mov QWORD PTR [rbp-16], 0
mov QWORD PTR [rbp-24], OFFSET FLAT:.LC0
add DWORD PTR [rbp-4], 1
mov eax, DWORD PTR [rbp-4]
pop rbp
ret
So now the main arguments got pushed down even further from base pointer. Why is the space between the first two locals 12 bytes but the space between the second two locals 8 bytes? Is that because of the sizes of the types?
I'm only going to answer this part of the question:
Shouldn't the parameters have been placed on the stack before the call? Why are argc and argv placed at offset 20 and 32 from the base pointer of the current stack frame?
The parameters to main are indeed set up by the code that calls main.
This appears to be code compiled according to the 64-bit ELF psABI for x86, in which the first several parameters to any function are passed in registers, not on the stack. When control reaches the main: label, argc will be in edi, argv will be in rsi, and a third argument conventionally called envp will be in rdx. (You didn't declare that argument, so you can't use it, but the code that calls main is generic and always sets it up.)
The instructions I believe you are referring to
mov DWORD PTR [rbp-20], edi
mov QWORD PTR [rbp-32], rsi
are what compiler nerds call spill instructions: they are copying the initial values of the argc and argv parameters from their original registers to the stack, just in case those registers are needed for something else. As several other people pointed out, this is unoptimized code; these instructions are unnecessary and would not have been emitted if you had turned optimization on. Of course, if you'd turned optimization on you'd have gotten code that doesn't touch the stack at all:
main:
mov eax, 2
ret
In this ABI, the compiler is allowed to put the "spill slots," to which register values are saved, wherever it wants within the stack frame. Their locations do not have to make sense, and may vary from compiler to compiler, from patchlevel to patchlevel of the same compiler, or with apparently-unconnected changes to the source code.
(Some ABIs do specify stack frame layout in some detail, e.g. IIRC the 32-bit Windows ABI does this, to facilitate "unwinding", but that's not important right now.)
(To underline that the arguments to main are in registers, this is the assembly I get at -O1 from
int main(int argc) { return argc + 1; }
:
main:
lea eax, [rdi+1]
ret
Still doesn't do anything with the stack! (Besides ret.))
This is "compiler 101" and what you want to research is "calling convention" and "stack frame". The details are compiler/OS/optimizations dependent. Briefly, incoming parameters may be in registers or on stack. When a function is entered, it may create a stack frame to save some of the registers. And then it may define a "frame pointer" to reference stack locals and stack parameters off the frame pointer. Sometimes the stack pointer is used as a frame pointer as well.
As for registers, usually someone (company) would define a calling convention and specifies which registers are "volatile", meaning that they can be used by a routine without issues, and "preserved", meaning that if a routine uses them, they will have to be saved and restored on function entry and exit. The calling convention also specifies which registers (if any) are used for parameter passing and function return.

Difference between extern and volatile

This question regards the difference between the volatile and extern variable and also the compiler optimization.
One extern variable defined in main file and used in one more source file, like this:
ExternTest.cpp:
short ExtGlobal;
void Fun();
int _tmain(int argc, _TCHAR* argv[])
{
ExtGlobal=1000;
while (ExtGlobal < 2000)
{
Fun();
}
return 0;
}
Source1.cpp:
extern short ExtGlobal;
void Fun()
{
ExtGlobal++;
}
The assembly generated for this in the vs2012 as below:
ExternTest.cpp assembly for accessing the external variable
ExtGlobal=1000;
013913EE mov eax,3E8h
013913F3 mov word ptr ds:[01398130h],ax
while (ExtGlobal < 2000)
013913F9 movsx eax,word ptr ds:[1398130h]
01391400 cmp eax,7D0h
01391405 jge wmain+3Eh (0139140Eh)
Source.cpp assembly for modifying the extern variable
ExtGlobal++;
0139145E mov ax,word ptr ds:[01398130h]
01391464 add ax,1
01391468 mov word ptr ds:[01398130h],ax
From the above assembly, every access to the variable "ExtGlobal" in the while loop reads the value from the corresponding address. If i add volatile to the external variable the same assembly code was generated. Volatile usage in two different threads and external variable usage in two different functions are same.
Asking about extern and volatile is like asking about peanuts and gorillas. They're completely unrelated.
extern is used simply to tell the compiler, "Hey, don't expect to find the definition of this symbol in this C file. Let the linker fix it up at the end."
volatile essentially tells the compiler, "Never trust the value of this variable. Even if you just stored a value from a register to that memory location, don't re-use the value in the register - make sure to re-read it from memory."
If you want to see that volatile causes different code to be generated, write a series of reads/writes from the variable.
For example, compiling this code in cygwin, with gcc -O1 -c,
int i;
void foo() {
i = 4;
i += 2;
i -= 1;
}
generates the following assembly:
_foo proc near
mov dword ptr ds:_i, 5
retn
_foo endp
Note that the compiler knew what the result would be, so it just went ahead and optimized it.
Now, adding volatile to int i generates the following:
public _foo
_foo proc near
mov dword ptr ds:_i, 4
mov eax, dword ptr ds:_i
add eax, 2
mov dword ptr ds:_i, eax
mov eax, dword ptr ds:_i
sub eax, 1
mov dword ptr ds:_i, eax
retn
_foo endp
The compiler never trusts the value of i, and always re-loads it from memory.

Visual Studio Inline assembler not working as expected

Is
MOV MUL_AXB[EBX * 4], EAX
supposed to actually change the effective MUL_AXB address?
I have declared MUL_AXB as
int* MUL_AXB;
on the global scope after the using statements and i have assigned it a value with
MUL_AXB = (int*) memset(malloc(size), 0, size);
Any insight on this issue is highly appreciated.
What you are doing will write in memory somewhere past the mul_axb pointer, causing bad things to happen.
To actually write into the allocated array, you need to load the pointer into a register first. Assuming this is 32-bit code:
mov edx, [mul_axb]
mov [edx + ebx*4], eax

Alloca implementation

How does one implement alloca() using inline x86 assembler in languages like D, C, and C++? I want to create a slightly modified version of it, but first I need to know how the standard version is implemented. Reading the disassembly from compilers doesn't help because they perform so many optimizations, and I just want the canonical form.
Edit: I guess the hard part is that I want this to have normal function call syntax, i.e. using a naked function or something, make it look like the normal alloca().
Edit # 2: Ah, what the heck, you can assume that we're not omitting the frame pointer.
implementing alloca actually requires compiler assistance. A few people here are saying it's as easy as:
sub esp, <size>
which is unfortunately only half of the picture. Yes that would "allocate space on the stack" but there are a couple of gotchas.
if the compiler had emitted code
which references other variables
relative to esp instead of ebp
(typical if you compile with no
frame pointer). Then those
references need to be adjusted. Even with frame pointers, compilers do this sometimes.
more importantly, by definition, space allocated with alloca must be
"freed" when the function exits.
The big one is point #2. Because you need the compiler to emit code to symmetrically add <size> to esp at every exit point of the function.
The most likely case is the compiler offers some intrinsics which allow library writers to ask the compiler for the help needed.
EDIT:
In fact, in glibc (GNU's implementation of libc). The implementation of alloca is simply this:
#ifdef __GNUC__
# define __alloca(size) __builtin_alloca (size)
#endif /* GCC. */
EDIT:
after thinking about it, the minimum I believe would be required would be for the compiler to always use a frame pointer in any functions which uses alloca, regardless of optimization settings. This would allow all locals to be referenced through ebp safely and the frame cleanup would be handled by restoring the frame pointer to esp.
EDIT:
So i did some experimenting with things like this:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#define __alloca(p, N) \
do { \
__asm__ __volatile__( \
"sub %1, %%esp \n" \
"mov %%esp, %0 \n" \
: "=m"(p) \
: "i"(N) \
: "esp"); \
} while(0)
int func() {
char *p;
__alloca(p, 100);
memset(p, 0, 100);
strcpy(p, "hello world\n");
printf("%s\n", p);
}
int main() {
func();
}
which unfortunately does not work correctly. After analyzing the assembly output by gcc. It appears that optimizations get in the way. The problem seems to be that since the compiler's optimizer is entirely unaware of my inline assembly, it has a habit of doing the things in an unexpected order and still referencing things via esp.
Here's the resultant ASM:
8048454: push ebp
8048455: mov ebp,esp
8048457: sub esp,0x28
804845a: sub esp,0x64 ; <- this and the line below are our "alloc"
804845d: mov DWORD PTR [ebp-0x4],esp
8048460: mov eax,DWORD PTR [ebp-0x4]
8048463: mov DWORD PTR [esp+0x8],0x64 ; <- whoops! compiler still referencing via esp
804846b: mov DWORD PTR [esp+0x4],0x0 ; <- whoops! compiler still referencing via esp
8048473: mov DWORD PTR [esp],eax ; <- whoops! compiler still referencing via esp
8048476: call 8048338 <memset#plt>
804847b: mov eax,DWORD PTR [ebp-0x4]
804847e: mov DWORD PTR [esp+0x8],0xd ; <- whoops! compiler still referencing via esp
8048486: mov DWORD PTR [esp+0x4],0x80485a8 ; <- whoops! compiler still referencing via esp
804848e: mov DWORD PTR [esp],eax ; <- whoops! compiler still referencing via esp
8048491: call 8048358 <memcpy#plt>
8048496: mov eax,DWORD PTR [ebp-0x4]
8048499: mov DWORD PTR [esp],eax ; <- whoops! compiler still referencing via esp
804849c: call 8048368 <puts#plt>
80484a1: leave
80484a2: ret
As you can see, it isn't so simple. Unfortunately, I stand by my original assertion that you need compiler assistance.
It would be tricky to do this - in fact, unless you have enough control over the compiler's code generation it cannot be done entirely safely. Your routine would have to manipulate the stack, such that when it returned everything was cleaned, but the stack pointer remained in such a position that the block of memory remained in that place.
The problem is that unless you can inform the compiler that the stack pointer is has been modified across your function call, it may well decide that it can continue to refer to other locals (or whatever) through the stack pointer - but the offsets will be incorrect.
For the D programming language, the source code for alloca() comes with the download. How it works is fairly well commented. For dmd1, it's in /dmd/src/phobos/internal/alloca.d. For dmd2, it's in /dmd/src/druntime/src/compiler/dmd/alloca.d.
The C and C++ standards don't specify that alloca() has to the use the stack, because alloca() isn't in the C or C++ standards (or POSIX for that matter)¹.
A compiler may also implement alloca() using the heap. For example, the ARM RealView (RVCT) compiler's alloca() uses malloc() to allocate the buffer (referenced on their website here), and also causes the compiler to emit code that frees the buffer when the function returns. This doesn't require playing with the stack pointer, but still requires compiler support.
Microsoft Visual C++ has a _malloca() function that uses the heap if there isn't enough room on the stack, but it requires the caller to use _freea(), unlike _alloca(), which does not need/want explicit freeing.
(With C++ destructors at your disposal, you can obviously do the cleanup without compiler support, but you can't declare local variables inside an arbitrary expression so I don't think you could write an alloca() macro that uses RAII. Then again, apparently you can't use alloca() in some expressions (like function parameters) anyway.)
¹ Yes, it's legal to write an alloca() that simply calls system("/usr/games/nethack").
Continuation Passing Style Alloca
Variable-Length Array in pure ISO C++. Proof-of-Concept implementation.
Usage
void foo(unsigned n)
{
cps_alloca<Payload>(n,[](Payload *first,Payload *last)
{
fill(first,last,something);
});
}
Core Idea
template<typename T,unsigned N,typename F>
auto cps_alloca_static(F &&f) -> decltype(f(nullptr,nullptr))
{
T data[N];
return f(&data[0],&data[0]+N);
}
template<typename T,typename F>
auto cps_alloca_dynamic(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
vector<T> data(n);
return f(&data[0],&data[0]+n);
}
template<typename T,typename F>
auto cps_alloca(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
switch(n)
{
case 1: return cps_alloca_static<T,1>(f);
case 2: return cps_alloca_static<T,2>(f);
case 3: return cps_alloca_static<T,3>(f);
case 4: return cps_alloca_static<T,4>(f);
case 0: return f(nullptr,nullptr);
default: return cps_alloca_dynamic<T>(n,f);
}; // mpl::for_each / array / index pack / recursive bsearch / etc variacion
}
LIVE DEMO
cps_alloca on github
alloca is directly implemented in assembly code.
That's because you cannot control stack layout directly from high level languages.
Also note that most implementation will perform some additional optimization like aligning the stack for performance reasons.
The standard way of allocating stack space on X86 looks like this:
sub esp, XXX
Whereas XXX is the number of bytes to allcoate
Edit:
If you want to look at the implementation (and you're using MSVC) see alloca16.asm and chkstk.asm.
The code in the first file basically aligns the desired allocation size to a 16 byte boundary. Code in the 2nd file actually walks all pages which would belong to the new stack area and touches them. This will possibly trigger PAGE_GAURD exceptions which are used by the OS to grow the stack.
You can examine sources of an open-source C compiler, like Open Watcom, and find it yourself
If you can't use c99's Variable Length Arrays, you can use a compound literal cast to a void pointer.
#define ALLOCA(sz) ((void*)((char[sz]){0}))
This also works for -ansi (as a gcc extension) and even when it is a function argument;
some_func(&useful_return, ALLOCA(sizeof(struct useless_return)));
The downside is that when compiled as c++, g++>4.6 will give you an error: taking address of temporary array ... clang and icc don't complain though
Alloca is easy, you just move the stack pointer up; then generate all the read/writes to point to this new block
sub esp, 4
What we want to do is something like that:
void* alloca(size_t size) {
<sp> -= size;
return <sp>;
}
In Assembly (Visual Studio 2017, 64bit) it looks like:
;alloca.asm
_TEXT SEGMENT
PUBLIC alloca
alloca PROC
sub rsp, rcx ;<sp> -= size
mov rax, rsp ;return <sp>;
ret
alloca ENDP
_TEXT ENDS
END
Unfortunately our return pointer is the last item on the stack, and we do not want to overwrite it. Additionally we need to take care for the alignment, ie. round size up to multiple of 8. So we have to do this:
;alloca.asm
_TEXT SEGMENT
PUBLIC alloca
alloca PROC
;round up to multiple of 8
mov rax, rcx
mov rbx, 8
xor rdx, rdx
div rbx
sub rbx, rdx
mov rax, rbx
mov rbx, 8
xor rdx, rdx
div rbx
add rcx, rdx
;increase stack pointer
pop rbx
sub rsp, rcx
mov rax, rsp
push rbx
ret
alloca ENDP
_TEXT ENDS
END
my_alloca: ; void *my_alloca(int size);
MOV EAX, [ESP+4] ; get size
ADD EAX,-4 ; include return address as stack space(4bytes)
SUB ESP,EAX
JMP DWORD [ESP+EAX] ; replace RET(do not pop return address)
I recommend the "enter" instruction. Available on 286 and newer processors (may have been available on the 186 as well, I can't remember offhand, but those weren't widely available anyways).

Resources