need explanation of how memory address work in this C program - c

I have a very simple C program where I am (out of my own curiosity) investigating which memory addresses are used to allocate local variables. My program is:
#include <stdio.h>
int main()
{
char buffer_1[8], buffer_2[8], buffer_3[8];
printf("address of buffer_1 %p\n", buffer_1);
printf("address of buffer_2 %p\n", buffer_2);
printf("address of buffer_3 %p\n", buffer_3);
return 0;
}
output is as follows:
address of buffer_1 0x7fff5fbfec30
address of buffer_2 0x7fff5fbfec20
address of buffer_3 0x7fff5fbfec10
my question is: why do the address seem to be getting smaller? Is there some logic to this? thank you.

The compiler is allowed to do whatever it wants with your automatic variables. In this case it just looks like it's putting them consecutively on the stack. On most popular systems in use today, stacks grow downwards.

Most compilers allocate stack memory for local variables in one step, at the very beginning pf the function. The memory is allocated as a single continuous block. Under these circumstances, the compiler, obviously, is free to use absolutely any memory layout for local variables inside that block. If can put them there so that the addresses increase in the order of declaration. Or decrease. Or arranged randomly. It is an implementation detail. And there's not much logic behind it.
It is quite possible that in your case the compiler tried to "pretend" that the memory for the arrays was allocated in the stack sequentially and independently (even though that was not the case). If on your platform stack grows downwards (as it does on many platforms), then it is expected that object declared later will have smaller addresses.
But again, functions don't allocate local objects individually. And on top of that the language makes no guarantees about any relationships between local object addresses. So, there's no real reason to prefer one ordering over the other.

The output of your C program is platform-dependent, compiler-dependent.
There cannot be just one perfect answer because the address arrangements vary based on:
Whether the system is little or big endian.
What kind of OS you are compiling on.
What kind of memory architecture you are compiling for.
What kind of compiler you are using(and compilers might have bugs too)
Whether you are on 64-bit or 32-bit platform.
And so much more.
But most important of all, is the type of processor architecture. :)
Here is a list of stack growth strategies per processor:
x86,PDP11 Downwards
System z In a linked list fashion, downwards, mostly.
ARM Select-able and can grow in either up or downward.
Mostek6502 Downwards (but only 256 bytes).
SPARC In a circular fashion with a sliding window, a limited depth stack.
RCA1802A Subject to SCRT(Standard Call and Return Technique) implementation.
But, in general, your compiler, at compile-time should map those addresses into the binary file generated. Then at the run-time, the binary file may occupy(or may pretend to occupy) a sequential set of memory addresses. And in your case the addresses printed by your C source, show that the stack is growing downward.

Basically compiler has responsibility to allocate memory to all the variables .
Array gets address on stack. but it has nothing to do with the o/p you are getting.
Basically The thing is compiler found the contiguous space(or chunk of memory) empty at that time and hence it allocated it to your program.

Related

Are arrays guaranteed to be contiguous in virtual memory?

int main() {
char a[3] = {1,2,3};
return sizeof(a);
}
Is a guaranteed to be in consecutive bytes in virtual memory?
I know it may not be consecutive in physical memory as the mapping is done behind the scene by the MMU.
If the compiler notices i'm not taking the address of any of the elements, is it then free to put them on non consecutive addresses in memory or even put them in a register?
Let's assume the optimizer will not fully get rid of it in my example.
Is a guaranteed to be in consecutive bytes in virtual memory? I know it may not be consecutive in physical memory as the mapping is done behind the scene by the MMU.
Your code is guaranteed to behave as-if the array was in consecutive bytes of address space.
If the compiler notices i'm not taking the address of any of the elements, is it then free to put them on non consecutive addresses in memory or even put them in a register?
It is free to do so even if you do take the address. The compiler can compile your code any way it wants to make it efficient so long as the code doesn't break.
Let's assume the optimizer will not fully get rid of it in my example.
Okay. But it's allowed to. C has an "as-if" rule which means that all rules are just about the behavior your code has to observe, they don't constrain how the compiler (or the machine) get that behavior.
The non-pedantic answer is that, yes, for all intents and purposes, arrays are guaranteed to be stored contiguously in C. This is no accident, it's pretty much a fundamental part of the definition of an array.
The whole point of an array is that you can access any element in constant (that is, O(1)) time, notionally by computing an address, and without having to chase any links as you would with almost any other data structure. So if, as some obscure and invisible implementation detail, an array were somehow not actually stored contiguously, it would always have to act exactly as if it were stored contiguously. And since contiguity is the defining property of an array, I don't think there's any harm in thinking consciously about that property, and assuming that it's always true.
Well in this particular code snippet most likely a will be stored in the stack and yes in this particular case it would be contiguous.
However if you were to run at a higher optimization most likely any compiler would not even store a and just do constant propagation and just return the size of a.
If you were to allocate memory in heap then it depends on the allocator most likely if it is Buddy or slab allocator then more often than not they would be contiguous but if i were to have a naive free linked list allocator then it may not be contiguous when there are multiple calls for heap allocation. In case of Arrays this is not possible as you most likely will have single call so it would still be contigous.
Tools like objdump, opt from llvm, gdb etc are great tools if you want to check the disassembly and how the compiler lays out the assembly code across different optimization levels.
C is defined in terms of an abstract machine. The bytes are guaranteed to be contiguous in the abstract machine.
The real machine can do literally anything so long as the observable behaviour of the program matches an allowed output of the program in the abstract machine.
So, there are no guarantees about any placement of data in the real machine.

Difference between variables' addresses

Why do variable addresses differ by a specific amount each time I run a program (as in "printf("%d %d\n", &a, &b);". It will print "1000 988" in one run, "924 912" in another, "1288 1276", and so on and so forth)? Does the compiler occupy a set amount of memory after each variable declaration where nothing can be written? If yes, what does it depend on? Using some variables in a program of mine, the smallest difference between them was 12 bytes, and it reached up to 212. This was the only case where the difference was not a multiple of twelve (in other cases it was 24, 36 or 48 bytes). Is there any reason behind that? Since my variables were of type int (occupying 4 bytes in my system), could the difference between my variable addresses be less than 12 (for example 4)? Do those address differences depend on the variable types? If yes, in what way? Thank you in advance!
Most OSes today use address-space layout randomization in order to make it harder to write certain kinds of malware. (The kind that writes code to memory and then tries to get the program to hand over control to it; now has to guess what address to get the program to jump to.) As a result, variables won’t be at the same addresses every time you run a program.
Depending on the type of the variable, how it’s allocated and which OS and architecture you’re running on, the size and alignment of variables will vary. The compiler and runtime might or might not always put them on a four-, eight- or sixteen-byte boundary. For example, the x86_64 function-call ABI always starts a function’s stack frame on a sixteen-byte boundary, and some implementations of malloc() always return an address divisible by sixteen because that’s required to store vectors on some CPUs.
If you want to know what the compiler is doing, you can try compiling to assembly. On gcc or clang, you can do this with the -S flag.
If you're asking why the memory address for a variable differs in between different executable runs the answer is ASLR, which exists to make it harder to exploit security issues in code (see https://en.wikipedia.org/wiki/Address_space_layout_randomization).
If you disable ASLR you will get the same address for a given variable each time you run your executable.
See also Difference between gdb addresses and "real" addresses?
Your linker (and to some degree, your compiler) lays out the address space of your application. The linker typically builds a relocatable image based at some address (e.g., zero). Apparently, your loader is placing the relocatable image at different locations when it is run.
Does the compiler occupy a set amount of memory after each variable declaration where nothing can be written?
Typically no UNLESS, the next variable needs to be aligned. Variables are normally aligned to addresses that are multiples of the variable's size.
It sounds like your compiler is allocating memory for something that you simply are not accounting for.

Stack growing in wrong direction in Linux

I have studied that in linux system Stack grow from high memory ddress to low memory address. To test this i have written a small code:
#include<stdio.h>
void func() {
int var1;
int var2;
printf("Func: %p %p",&var1,&var2);
}
int main() {
int var1;
int var2;
printf("Main: %p %p\n",&var1,&var2);
func();
return 0;
}
While I run this in in ideone, I get following output:
Main: 0xbfd958f0 0xbfd958f4
Func: 0xbfd958f8 0xbfd958fc
According to the textbook, Func should be stored in Lower memory address than Main, but here what is happening is completely opposite. Can somebody explain me this behaviour. Here is the link to ideone.
Thank you.
Typically the stack grows down from high memory, and the heap grows up from low memory, so they will never "bump into" each other.
The stack can theoretically grow in either direction, though. x86 supports stacks growing either direction but I've never seen anyone use an upward-growing stack on purpose.
The best part is that Intel refers to downward-growing stacks as "grow up" and upward-growing stacks as "grow down."
NOTE:- You should not assume anything about the ordering of local variables inside the stack frame. The compiler might put the "first" variable "first" in the sense of pushing it at the current location, meaning the "first" variable is at a higher address. Or it could organize the variables upward in memory (more likely) giving the "first" variable a lower address. Or it could arrange the variables completely at random. If optimizing, it may even eliminate variables, or use the same memory location for more than one variable if their lifetimes don't overlap.
You can follow this link
BUFFER OVERFLOW 7
but it's still important to know that the return address is not guaranteed to be arranged in any particular way. If -fomit-frame-pointer is used, then the base pointer will not be on the stack. And as I said before, the ordering of local variables conforms to no specific convention.
Another complication is the presence of more than one calling convention in the same program. It is not generally possible just by looking at code addresses to tell what convention a function conforms to. The stack frame may look very different from what you expect.

How do pointers work "under the hood" in C?

Take a simple program like this:
int main(void)
{
char p;
char *q;
q = &p;
return 0;
}
How is &p determined? Does the compiler calculate all such references before-hand or is it done at runtime? If at runtime, is there some table of variables or something where it looks these things up? Does the OS keep track of them and it just asks the OS?
My question may not even make sense in the context of the correct explanation, so feel free to set me straight.
How is &p determined? Does the compiler calculate all such references before-hand or is it done at runtime?
This is an implementation detail of the compiler. Different compilers can choose different techniques depending on the kind of operating system they are generating code for and the whims of the compiler writer.
Let me describe for you how this is typically done on a modern operating system like Windows.
When the process starts up, the operating system gives the process a virtual address space, of, let's say 2GB. Of that 2GB, a 1MB section of it is set aside as "the stack" for the main thread. The stack is a region of memory where everything "below" the current stack pointer is "in use", and everything in that 1MB section "above" it is "free". How the operating system chooses which 1MB chunk of virtual address space is the stack is an implementation detail of Windows.
(Aside: whether the free space is at the "top" or "bottom" of the stack, whether the "valid" space grows "up" or "down" is also an implementation detail. Different operating systems on different chips do it differently. Let's suppose the stack grows from high addresses to low addresses.)
The operating system ensures that when main is invoked, the register ESP contains the address of the dividing line between the valid and free portions of the stack.
(Aside: again, whether the ESP is the address of the first valid point or the first free point is an implementation detail.)
The compiler generates code for main that pushes the stack pointer by lets say five bytes, by subtracting from it if the stack is growing "down". It decreases by five because it needs one byte for p and four for q. So the stack pointer changes; there are now five more "valid" bytes and five fewer "free" bytes.
Let's say that q is the memory that is now in ESP through ESP+3 and p is the memory now in ESP+4. To assign the address of p to q, the compiler generates code that copies the four byte value ESP+4 into the locations ESP through ESP+3.
(Aside: Note that it is highly likely that the compiler lays out the stack so that everything that has its address taken is on an ESP+offset value that is divisible by four. Some chips have requirements that addresses be divisible by pointer size. Again, this is an implementation detail.)
If you do not understand the difference between an address used as a value and an address used as a storage location, figure that out. Without understanding that key difference you will not be successful in C.
That's one way it could work but like I said, different compilers can choose to do it differently as they see fit.
The compiler cannot know the full address of p at compile-time because a function can be called multiple times by different callers, and p can have different values.
Of course, the compiler has to know how to calculate the address of p at run-time, not only for the address-of operator, but simply in order to generate code that works with the p variable. On a regular architecture, local variables like p are allocated on the stack, i.e. in a position with fixed offset relative to the address of the current stack frame.
Thus, the line q = &p simply stores into q (another local variable allocated on the stack) the address p has in the current stack frame.
Note that in general, what the compiler does or doesn't know is implementation-dependent. For example, an optimizing compiler might very well optimize away your entire main after analyzing that its actions have no observable effect. The above is written under the assumption of a mainstream architecture and compiler, and a non-static function (other than main) that may be invoked by multiple callers.
This is actually an extraordinarily difficult question to answer in full generality because it's massively complicated by virtual memory, address space layout randomization and relocation.
The short answer is that the compiler basically deals in terms of offsets from some “base”, which is decided by the runtime loader when you execute your program. Your variables, p and q, will appear very close to the “bottom” of the stack (although the stack base is usually very high in VM and it grows “down”).
Address of a local variable cannot be completely calculated at compile time. Local variables are typically allocated in the stack. When called, each function allocates a stack frame - a single continuous block of memory in which it stores all its local variables. The physical location of the stack frame in memory cannot be predicted at compile time. It will only become known at run-time. The beginning of each stack frame is typically stored at run-time in a dedicated processor register, like ebp on Intel platform.
Meanwhile, the internal memory layout of a stack frame is pre-determined by the compiler at compile-time, i.e. it is the compiler who decides how local variables will be laid out inside the stack frame. This means that the compiler knows the local offset of each local variable inside the stack frame.
Put this all together and we get that the exact absolute address of a local variable is the sum of the address of the stack frame itself (the run-time component) and the offset of this variable inside that frame (the compile-time component).
This is basically exactly what the compiled code for
q = &p;
will do. It will take the current value of the stack frame register, add some compile-time constant to it (offset of p) and store the result in q.
In any function, the function arguments and the local variables are allocated on the stack, after the position (program counter) of the last function at the point where it calls the current function. How these variables get allocated on the stack and then deallocated when returning from the function, is taken care of by the compiler during compile time.
For e.g. for this case, p (1 byte) could be allocated first on the stack followed by q (4 bytes for 32-bit architecture). The code assigns the address of p to q. The address of p naturally then is 5 added or subtracted from the the last value of the stack pointer. Well, something like that, depends on how the value of the stack pointer is updated and whether the stack grows upwards or downwards.
How the return value is passed back to the calling function is something that I'm not certain of, but I'm guessing that it is passed through the registers and not the stack. So, when the return is called, the underlying assembly code should deallocate p and q, place zero into the register, then return to the last position of the caller function. Of course, in this case, it is the main function, so it is more complicated in that, it causes the OS to terminate the process. But in other cases, it just goes back to the calling function.
In ANSI C, all the local variables should be placed at the top of the function and is allocated once into the stack when entering the function and deallocated when returning from the function. In C++ or later versions of C, this becomes more complicated when local variables can also be declared inside blocks (like if-else or while statement blocks). In this case, the local variable is allocated onto the stack when entering the block and deallocated when leaving the block.
In all cases, the address of a local variable is always a fixed number added or subtracted from the stack pointer (as calculated by the compiler, relative to the containing block) and the size of the variable is determined from the variable type.
However, static local variables and global variables are different in C. These are allocated in fixed locations in the memory, and thus there's a fixed address for them (or a fixed offset relative to the process' boundary), which is calculated by the linker.
Yet a third variety is memory allocated on the heap using malloc/new and free/delete. I think this discussion would be too lengthy if we include that as well.
That said, my description is only for a typical hardware architecture and OS. All of these are also dependent on a wide variety of things, as mentioned by Emmet.
p is a variable with automatic storage. It lives only as long as the function it is in lives. Every time its function is called memory for it is taken from the stack, therefore, its address can change and is not known until runtime.

How do you know the exact address of a variable?

So I'm looking through my C programming text book and I see this code.
#include <stdio.h>
int j, k;
int *ptr;
int main(void)
{
j = 1;
k = 2;
ptr = &k;
printf("\n");
printf("j has the value %d and is stored at %p\n", j, (void *)&j);
printf("k has the value %d and is stored at %p\n", k, (void *)&k);
printf("ptr has the value %p and is stored at %p\n", (void *)ptr, (void *)&ptr);
printf("The value of the integer pointed to by ptr is %d\n", *ptr);
return 0;
}
I ran it and the output was:
j has the value 1 and is stored at 0x4030e0
k has the value 2 and is stored at 0x403100
ptr has the value 0x403100 and is stored at 0x4030f0
The value of the integer pointed to by ptr is 2
My question is if I had not ran this through a compiler, how would you know the address to those variables by just looking at this code? I'm just not sure how to get the actual address of a variable. Thanks!
Here's my understanding of it:
The absolute addresses of things in memory in C is unspecified. It's not standardised into the language. Because of this, you can't know the locations of things in memory by looking at just the code. (However, if you use the same compiler, code, compiler options, runtime and operating system, the addresses may be consistent.)
When you're developing applications, this is not behaviour you should rely on. You may rely on the difference between the locations of two things in some contexts, however. For example, you can determine the difference between the addresses of pointers to two array elements to determine how many elements apart they are.
By the way, if you are considering using the memory locations of variables to solve a particular problem, you may find it helpful to post a separate question asking how to so without relying on this behaviour.
There is no other way to "know the exact address" of a variable in Standard C than to print it with "%p". The actual address is determined by many factors not under control of the programmer writing code. It's a matter of OS, the linker, the compiler, options used and probably others.
That said, in the embedded systems world, there are ways to express this variable must reside at this address, for example if registers of external devices are mapped into the address space of a running program. This usually happens in what is called a linker file or map file or by assigning an integral value to a pointer (with a cast). All of these methods are non-standard.
For the purpose of your everyday garden-variety programs though, the point of writing C programs is that you need and should not care where your variables are stored.
You can't.
Different compilers can put the variables in different places. On some machines the address is not a simple integer anyway.
The compiler only knows things like "the third integer global variable" and "the four bytes allocated 36 bytes down from the stack pointer." It refers to global vars, pointers to subroutines (functions), subroutine arguments and local vars only in relative terms. (Never mind the extra stuff for polymorphic objects in C++, yikes!) These relative references are saved in the object file (.o or .obj) as special codes and offset values.
The Linker can fill in some details. It may modify some of these sketchy location references when joining several object files. Global variable locations will share a space (the Data Section) when globals from multiple compilation units are merged; the linker decides what order they all go in, but still describing them as relative to the start of the entire set of global vars. The result is an executable file with the final opcodes, but addresses still being sketchy and based on relative offsets.
It's not until the executable is loaded that the Loader replaces all the relative addresses with actual addresses. This is possible now, because the loader (or some part of the operating system it depends on) decides where in the whole virtual address space of the process to store the program's opcodes (Text Section), global variables (BSS, Data Sections) and call stack, and other things. The loader can do the math, and write the actual address into every spot in the executable, typically as part of "load immediate" opcodes and all opcodes involving memory access.
Google "relocation table" for more. See http://www.iecc.com/linker/linker07.html (somewhat old) for a more detailed explanation for particular platforms.
In real life, it's all complicated by the fact that virtual addresses are mapped to physical addresses by a virtual memory system, using segments or some other mechanism to keep each process in a separate address space.
I would like to further build upon the answers already provided by pointing out that some compilers, such as Visual Studio's, have a feature called Address Space Layout Randomization (ASLR), which makes programs begin at a random memory address as an anti-virus feature. Given the addresses that you have in your output, I'd say that you compiled without it (programs without it start at address 0x400000, I think). My source for this information is an answer to this question.
That said, the compiler is what determines the memory addresses at which local variables will be stored. The addresses will most likely change from compiler to compiler, and probably also with each version of the source code.
Every process has its own logical address space starting from zero. Addressees your program can access are all relative to zero. Absolute address of any memory location is decided only after loading the process in main memory. This is done using dynamic relocation by modern operating systems. Hence every time a process is loaded into memory it may be loaded at different location according to availability of the memory. Hence allowing user processes to know exact address of data stored in memory does not make any sense. What your code is printing, is a logical address and not the exact or physical address.
Continuing on the answers described above, please do not forget that processes would run in their own virtual address space (process isolation). This ensures that when your program corrupts some memory, the other running processes are not affected.
Process Isolation:
http://en.wikipedia.org/wiki/Process_isolation
Inter-Process Communication
http://en.wikipedia.org/wiki/Inter-process_communication

Resources