Does C at first tries to assign a certain address? [duplicate] - c

I'm trying to understand how C allocates memory on stack. I always thought variables on stack could be depicted like structs member variables, they occupy successive, contiguous bytes block within the Stack. To help illustrate this issue I found somewhere, I created this small program which reproduced the phenomenon.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void function(int *i) {
int *_prev_int = (int *) ((long unsigned int) i - sizeof(int)) ;
printf("%d\n", *_prev_int );
}
void main(void)
{
int x = 152;
int y = 234;
function(&y);
}
See what I'm doing? Suppose sizeof(int) is 4: I'm looking 4 bytes behind the passed pointer, as that would read the 4 bytes before where int y in the caller's stack.
It did not print the 152. Strangely when I look at the next 4 bytes:
int *_prev_int = (int *) ((long unsigned int) i + sizeof(int)) ;
and now it works, prints whatever in x inside the caller's stack. Why x has a lower address than y? Are stack variables stored upside down?

Stack organization is completely unspecified and is implementation specific. In practice, it depends a lot of the compiler (even of its version) and of optimization flags.
Some variables don't even sit on the stack (e.g. because they are just kept inside some registers, or because the compiler optimized them -e.g. by inlining, constant folding, etc..).
BTW, you could have some hypothetical C implementation which does not use any stack (even if I cannot name such implementation).
To understand more about stacks:
Read the wikipage on call stacks, tail calls, threads, and on continuations
Become familiar with your computer's architecture & instruction set (e.g. x86) & ABI, then ...
ask your compiler to show the assembler code and/or some intermediate compiler representations. If using GCC, compile some simple code with gcc -S -fverbose-asm (to get assembler code foo.s when compiling foo.c) and try several optimization levels (at least -O0, -O1, -O2 ....). Try also the -fdump-tree-all option (it dumps hundred of files showing some internal representations of the compiler for your source code). Notice that GCC also provides return address builtins
Read Appel's old paper on garbage collection can be faster than stack allocation, and understand garbage collection techniques (since they often need to inspect and possibly change some pointers inside call stack frames). To know more about GC, read the GC handbook.
Sadly, I know no low-level language (like C, D, Rust, C++, Go, ...) where the call stack is accessible at the language level. This is why coding a garbage collector for C is difficult (since GC-s need to scan the call stack pointers)... But see Boehm's conservative GC for a very practical and pragmatic solution.

Almost all the processors architectures nowadays supports stack manipulation instruction (e.g LDM,STM instructions in ARM). Compilers with the help of those implements stack. In most of the cases when data is pushed into stack, stack pointer decrements (Growing Downwards) and Increments when data popped from stack.
So it depends on processor architecture and compiler how stack is implemented.

Depends on the compiler and platform. The same thing can be done in more than one way as long it is done consistently by a program (this case the compiler translation to assembly, i.e. machine code) and the platform supports it (good compilers try to optimize assembly to get the “most” of each platform).
A very good source to deeply understand what goes behind the scenes of c, what happens when compiling a program and why they happen, is the free book Reverse Engineering for Beginners (Understanding Assembly Language) by Dennis Yurichev, the latest version can be found at his site.

Related

Why do variables are stored in an inverted order than the declaration order [duplicate]

I'm trying to understand how C allocates memory on stack. I always thought variables on stack could be depicted like structs member variables, they occupy successive, contiguous bytes block within the Stack. To help illustrate this issue I found somewhere, I created this small program which reproduced the phenomenon.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void function(int *i) {
int *_prev_int = (int *) ((long unsigned int) i - sizeof(int)) ;
printf("%d\n", *_prev_int );
}
void main(void)
{
int x = 152;
int y = 234;
function(&y);
}
See what I'm doing? Suppose sizeof(int) is 4: I'm looking 4 bytes behind the passed pointer, as that would read the 4 bytes before where int y in the caller's stack.
It did not print the 152. Strangely when I look at the next 4 bytes:
int *_prev_int = (int *) ((long unsigned int) i + sizeof(int)) ;
and now it works, prints whatever in x inside the caller's stack. Why x has a lower address than y? Are stack variables stored upside down?
Stack organization is completely unspecified and is implementation specific. In practice, it depends a lot of the compiler (even of its version) and of optimization flags.
Some variables don't even sit on the stack (e.g. because they are just kept inside some registers, or because the compiler optimized them -e.g. by inlining, constant folding, etc..).
BTW, you could have some hypothetical C implementation which does not use any stack (even if I cannot name such implementation).
To understand more about stacks:
Read the wikipage on call stacks, tail calls, threads, and on continuations
Become familiar with your computer's architecture & instruction set (e.g. x86) & ABI, then ...
ask your compiler to show the assembler code and/or some intermediate compiler representations. If using GCC, compile some simple code with gcc -S -fverbose-asm (to get assembler code foo.s when compiling foo.c) and try several optimization levels (at least -O0, -O1, -O2 ....). Try also the -fdump-tree-all option (it dumps hundred of files showing some internal representations of the compiler for your source code). Notice that GCC also provides return address builtins
Read Appel's old paper on garbage collection can be faster than stack allocation, and understand garbage collection techniques (since they often need to inspect and possibly change some pointers inside call stack frames). To know more about GC, read the GC handbook.
Sadly, I know no low-level language (like C, D, Rust, C++, Go, ...) where the call stack is accessible at the language level. This is why coding a garbage collector for C is difficult (since GC-s need to scan the call stack pointers)... But see Boehm's conservative GC for a very practical and pragmatic solution.
Almost all the processors architectures nowadays supports stack manipulation instruction (e.g LDM,STM instructions in ARM). Compilers with the help of those implements stack. In most of the cases when data is pushed into stack, stack pointer decrements (Growing Downwards) and Increments when data popped from stack.
So it depends on processor architecture and compiler how stack is implemented.
Depends on the compiler and platform. The same thing can be done in more than one way as long it is done consistently by a program (this case the compiler translation to assembly, i.e. machine code) and the platform supports it (good compilers try to optimize assembly to get the “most” of each platform).
A very good source to deeply understand what goes behind the scenes of c, what happens when compiling a program and why they happen, is the free book Reverse Engineering for Beginners (Understanding Assembly Language) by Dennis Yurichev, the latest version can be found at his site.

How much memory is used in the following C program?

Suppose int size is 4 bytes.
Following the Code snippet in C, how much bytes is requested to store the variables?
* I read that some can be stored in the registers / stack, but I asked for the total size, therefore it doesn't matter.
{
int a,b;
{
int c;
}
{
int d, e;
}
}
Thanks in advance.
You should not care, and it depends a lot upon the optimization flags and the compiler.
A variable could stay entirely in a processor register, and then it does not consume memory (and sometimes it does not appear in the generated machine code, because the compiler figured out that it is useless). But read about the call stack and call frames and register allocation. Of course, a common sense rule is to avoid huge call frames (e.g. avoid declaring very large automatic variables such as double hugelocalarr[1000000];). A reasonable call frame should (in general) be at most a kilobyte or a few of them (often, the total call stack should not exceed a megabyte or a few of them, and you need to think about recursive functions or deeply nested calls).
In practice, if you compile with GCC, look into the command options such as -Wstack-usage=X (use it with various optimization flags, such as -O1 or -O2 ...) etc... You'll get warnings about functions using a lot of stack (more than X bytes).
Be also aware of tail calls. Recent compilers are sometimes able to cleverly optimize them. And think also of inline expansion. Compilers are able to do that when optimizing (even without any inline keyword).
Read the C is not a low-level language paper by David Chisnall.

How to copy 4 bytes of char buffer into long

Here is an interview question I was asked:
you have the following code:
long num;
char buff[50];
If buff is aligned to address 0 what is the most efficient way for num to get first 4 bytes of buff;
if we want to get the value of buff in a specidic place k (buff[k]) how will we do it?
Is it pertained with memory alignment?
Regards,
Ron
First, understand that the question is intended to ask about characteristics beyond those specified in the C standard. The C standard does not impose requirements about efficiency, so any question asking about efficiency is necessarily asking about C implementations, not about the C standard. The interviewer is not probing your knowledge of C per se; they are probing your knowledge of modern hardware, compilers, and so on.
As mentioned in xvan’s answer, you could use *num = * (long *) buff;. This works given some assumptions implicit in the question. In order for this to work reliably:
long must not have any trap representations, or we must know that the data being copied is not a trap representation.
long must be four bytes.
The compiler must tolerate aliasing. That is, it must not assume that, because the elements of buff are char, we will not access them through a pointer to long.
buff must be four-byte aligned as stated in the question, or the target hardware must support unaligned loads.
These characteristics are not uncommon in C implementations, particularly with corresponding options selected during compilation. The result of this code is likely to be a two-instruction sequence that loads four bytes from memory to a register and that stores four bytes from a register to memory. That is the knowledge I think the interviewer was testing you for.
However, this is not a great solution. As Ilja Everilä noted in a comment, you can simply write memcpy(&num, buff, sizeof num);. This is a proper C-standard way to copy bytes, and a good compiler will optimize it. For example, I just compiled this source code using Apple LLVM 8.1.0 on macOS 10.12.6 with “-O3 -std=c11 -S” (switches that request optimization, use of the 2011 C standard, and assembly code output):
#include <stdint.h>
#include <string.h>
void foo(uint32_t *L, char *A)
{
memcpy(L, A, sizeof *L);
}
and the resulting routine contains these instructions between the usual routine entry and exit code:
movl (%rsi), %eax
movl %eax, (%rdi)
Thus, the compiler has optimized the memcpy call into a load instruction and a store instruction. This is even though the compiler does not know what the alignment of buff might be. It apparently “believes” that unaligned loads and stores perform reasonably well on the target architecture, so it chose to implement the memcpy directly with load and store instructions rather than explicitly calling a library routine and looping to copy four individual bytes.
If a compiler does not immediately optimize the memcpy call like this, it may need a little help. For example, if the compiler does not know that buff is four-byte aligned, and the target hardware does not perform unaligned four-byte loads well (or at all), then the compiler will not optimize this memcpy into a load-store pair. In that case, some compilers have language extensions that let you tell them a pointer has more than the normal alignment, such as GCC’s __builtin_assume_aligned() as M.M. mentions. For example, Apple LLVM, I could do this:
typedef char AlignedBuffer[50] __attribute__((__aligned__(4)));
void foo(uint32_t *L, AlignedBuffer *A)
{
*L = * (long *) A;
}
That typedef tells the compiler that the AlignedBuffer type is always four-byte aligned, at least. This is, of course, an extension to the C language that is not available in all compilers. (Also, when doing this, I would have to ensure to use the compiler option that supports aliasing things through pointers to other types.)
Given that this compiler already knows how to optimize this case, trying to outsmart it with pointer casting is pointless. However, when working with other compilers in other situations, something like the pointer casting may be necessary to get the performance desired. But one needs to know that this is implementation dependent, and the code should be documented as such so that other people know it cannot be ported to other C implementations without addressing these issues.
Regarding the follow-up question, one can write *num = * (long *) (buff + k);. It is likely the point of this follow-up question is to probe your knowledge of hardware alignment requirements. On many systems, attempting to load four-byte data from an address that is not four-byte-aligned causes an exception. Therefore, this assignment statement is likely to fail on such hardware when k is not a multiple of four. (Also, we should note that k must be such that all bytes to be loaded are within buff, or are otherwise known to be accessible.) The interviewer likely wanted you to display that knowledge.
Typically with interview questions like this, there is not necessarily a single right answer that the interviewer wants. Mostly, they want to see that you are aware of the issues, have some understanding of them, and have some knowledge of potential ways to address them.

C embedded systems stack and heap size

How could I determine the current stack and heap size of a running C program on an embedded system? Also, how could I discover the maximum stack and heap sizes that my embedded system will allow? I thought about linearly calling malloc() with an increasing size until it fails to find the heap size, however I am more interested in the size of the stack.
I am using an mbed NXP LPC1768, and I am using an offline compiler developed on GitHub called gcc4mbed.
Any better ideas? All help is greatly appreciated!
For this look at your linker script, this will define how much space you have allocated to each.
For stack size usage do this:
At startup (before C main()) during initialization of memory, init all your stack bytes to known values such as 0xAA, or 0xCD. Run your program, at any point you can stop and see how many magic values you have left. If you don't see any magic values then you have overflowed your stack and weirdness may start to happen.
At runtime you can also check the last 4 bytes or so (maybe last two words, this is really up to you). If they don't match your magic value then force a reset. This only works if your system is well behaved on reset and it is best if it starts up quick and isn't doing something "real time" or mission critical.
Here's a really helpful whitepaper from IAR on the subject.
A crude way of measuring at runtime the current stack size is to declare
static void* mainsp;
then start your main with e.g:
int main(int argc, char**argv) {
int here;
mainsp = (void*) &here;
then inside some leaf routine, when the call stack is deep enough, do something similar to
int local;
printf ("stack size = %ld\n",
(long) ((intptr_t) &local - (intptr_t) mainsp));
Statically estimating from full source code of an application the required stack size is in general undecidable (think of recursion, function pointers), and in practice very difficult (even on a severely restricted class of applications). Look into Couverture. You might also consider customizing a recent GCC compiler with your plugin (perhaps Bismon in mid 2021; email me to basile.starynkevitch#cea.fr about it) for such purposes, but that won't be easy and will give you over-approximations.
If compiling with GCC, you might use the return address bultins to query the stack frame pointer at run time. On some architectures it is not available with some optimization flags. You could also use the -Wstack-usage=byte-size and/or -Wframe-larger-than=byte-size warning options to recent GCC.
As to how heap and stack spaces are distributed, this is system dependent. You might parse /proc/self/maps file on Linux. See proc(5). You could limit stack space on Linux in user-space using setrlimit(2).
Be however aware of Rice's theorem.
With multi-threaded applications things could be more difficult. Read some Pthread tutorial.
Notice that in simple cases, GCC may be capable of tail-call optimizations. You could compile your foo.c with gcc -Os -fverbose-asm -S foo.c and look inside the generated foo.s assembler code.
If you don't care about portability, consider also using the extended asm features of GCC.

Strange stack behavior in C

I'm worried that I am misunderstanding something about stack behavior in C.
Suppose that I have the following code:
int main (int argc, const char * argv[])
{
int a = 20, b = 25;
{
int temp1;
printf("&temp1 is %ld\n" , &temp1);
}
{
int temp2;
printf("&temp2 is %ld\n" , &temp2);
}
return 0;
}
Why am I not getting the same address in both printouts? I am getting that temp2 is one int away from temp1, as if temp1 was never recycled.
My expectation is for the stack to contain 20, and 25.
Then have temp1 on top, then have it removed, then have temp2 on top, then have it removed.
I am using gcc on Mac OS X.
Note that I am using the -O0 flag for compiling without optimizations.
Tho those wondering about the background for this question: I am preparing teaching materials on C, and I am trying to show the students that they should not only avoid returning pointers to automatic variables from functions, but also to avoid taking the address of variables from nested blocks and dereferencing them outside. I was trying to demonstrate how this causes problems, and couldn't get the screenshot.
The compiler is completely within its rights not to optimize temp1 and temp2 into the same location. It has been many years since compilers generated code for one stack operation at a time; these days the whole stack frame is laid out at one go. (A few years back a colleague and I figured out a particularly clever way to do this.) Naive stack layout probably puts each variable in its own slot, even when, as in your example, their lifetimes don't overlap.
If you're curious, you might get different results with gcc -O1 or gcc -O2.
There is no guarantee what address stack objects will receive regardless of the order they are declared.
The compiler can happily reorder the creation and duration of stack variables providing it does not affect the results of the function.
I believe the C standard just talks about the scope and lifetime of variables defined in a block. It makes no promises about how the variables interact with the stack or if a stack even exists.
I remember reading something about it. All I have now is this obscure link.
Just to let everybody know (and for the sake of the archives), it appears to be our kernel extension is running into a known limitation of GCC. Just to recap, we have a function in a very portable, very lightweight library, that for some reason is getting compiled with a 1600+ byte stack when compiled on/for Darwin. No matter what compiler options I tried, and what optimization levels I used, the stack was no smaller than 1400 "machine check" panic in pretty reproducible (but not frequent) situations.
After a lot of searching on the Web, learning some i386 assembly and talking to some people who are much better at assembly, I have learned that GCC is somewhat notorious for having horrid stack allocation. [...]
Apparently this is gcc's dirty little secret, except it's not much of a secret to some--Linus Torvalds has complained several times on various lists about the gcc stack allocation (search lkml.org for "gcc stack usage"). Once I knew what to search for, there was plenty of griping about gcc's subpar allocation of stack variables, and in particular, it's inability to re-use stack space for variables in different scopes.
With that said, my Linux version of gcc properly re-uses stack space, I get same address for both variables. Not sure what C standard says about it, but strict scope enforcement is only important for code correctness in C++ (due to destruction at the end of the scope), but not in C.
There is no standard that sets how variables are placed on the stack. What happens in the compiler is much more complicated. In your code, the compiler may even choose to completely ignore and suppress variables a and b.
During the many stages of the compiler, the code may be converted to it's SSA form, and all stack variables lose their addresses and meanings in this form (it may even make it harder for the debugger).
Stack space is very cheap, in the sense that the time to allocate either 2 or 20 variables is constant. Also, stack space is very dynamic for most function calls, since with the exception of a few functions (those nearer main() and thread-entry functions, with long-lived event loops or so), they tend to complete quickly. So, you just don't bother with them.
This is completely dependent on the compiler and how it is configured.

Resources