virtual address assignment in C and linux - c

In the program given below virtual address for both process is same. I understood the reason for global variables but could not understand for local variables.
how is virtual addresses assigned to local variables before running?
int main()
{
int a;
if (fork() == 0)
{
a = a + 5;
printf(“%d,%d\n”, a, &a);
}
else
{
a = a –5;
printf(“%d, %d\n”, a, &a);
}
}

Virtual addresses are... virtual. That means a same virtual address from two different processes (like a parent process and its child process) points to two different physical addresses.

While compiling, the compiler decides to use either the stack or a register for local variables. In this case, the stack.
It also decides where in the (virtual) address space to place the stack.
So for both processes the stack starts in the same (virtual) address. And since the flow of this specific program is rather deterministic, the stack frames look exactly the same for both processes, resulting in the same offset in the stack for 'a'.

Whatever the address of a was before the fork, it must surely be the same after the fork, so it necessarily is the same in the two processes, since their addresses for a are both equal to the same thing. In most implementations, the address of a is derived by adding an offset (determined by the compiler) to the content of the stack pointer. The content of the stack pointer is duplicated by fork.

Related

Are memory addreses portable in C?

Say we have program 1...
./program1
int main(int argc, char *argv[])
{
int *i;
*i = 10;
printf("%lld", i);
return 0;
}
Now program 2...
./program2 program1output 10
int main(int argc, char *argv[])
{
int *t;
t = (int*)atoll(argv[1]);
*t = atoi(argv[2]);
return 0;
}
Will this work? Can you share memory addresses between different programs?
This behavior is not defined by the C standard. On any general-purpose multi-user operating system, each process is given its own virtual address space. All of the memory assigned to a process is separate from the memory assigned to other processes except for certain shared memory:
Read-only data may be shared between processes, especially the instructions and constant data of two processes running the same executable and the instructions and constant data of shared libraries. That data may have the same address in different processes or different addresses (depending on various factors, including whether the code is position-independent and whether address space layout randomization is in use).
Some operating systems also map system-wide shared data into processes by default.
Memory may be shared between processes by explicit request of those processes to map shared memory segments. Those segments may or may not appear at the same virtual address in the different processes. (A request to map shared memory may request a certain address, in which case different processes could arrange to use the same address, or it could let the mapping software choose the address, in which case different processes cannot rely on receiving the same address assignment.)
In a special-purpose operating system, different processes could share one address space.
Supplement
This is not correct code:
int *i;
*i = 10;
The declaration int *i; defines i to be a pointer but does not assign it a value. Then using *i is improper because it attempts to refer to where i points, but i has not been assigned to point to anything.
To define an int and make its address visible in output, you could define int i; and then print &i.
This is not the proper way to print an address:
printf("%lld", i);
To print an address, cast it to void * and format it with %p. The result of the formatting is implementation-defined:
printf("%p", (void *) &i);
This is not a good way to reconstruct an address:
int *t;
t = (int*)atoll(argv[1]);
As with printf, the type should be void *, and there are problems attempting the conversion with atoll. The C standard does not guarantee it will work; the format produced by printing with %p might not be a normal integer format. Instead, use the %p specifier with sscanf:
void *temp;
if (1 != sscanf(argv[1], "%p", &temp))
exit(EXIT_FAILURE);
int *t = temp;
When the address comes from other process, the behavior of the sscanf conversion is not defined by the C standard.
In principal, an application operates on its own/private memory. There are ways of sharing memory among different processes, but this requires special mechanism to overcome above mentioned "principal" (memory mapped files, for example). Have a short look at, for example, this article on sharing memory.
In your case, program one will have ended and its memory is not available any more; and the way you access it is definitely not one of the "special mechanisms" necessary to access shared memory:
Though an integer vale may be converted to a pointer value, accessing this pointer is only valid if the integer value has originally been converted from a pointer to a valid object. This is not the case in your example, since the integral value calculated in t = (int*)atoll(argv[1]); never pointed to a valid object in the current program.
In general, memory addresses are tied to processes because each process may have its own memory space. So, the addresses are virtual addresses rather than physical addresses, which means they are references to a location in the process's memory space rather than references to a location on a chip.
(Not all environments have virtual memory. For example, an embedded system might not.)
If you have two programs running in the same process, a pointer can be passed between them. For example, a main program can pass a pointer to a dynamically linked library it loads.

Where are addresses of pointers stored in C?

I'm learning C and currently learn about pointers. I understand the principle of storing the address of a byte in memory as a variable, which makes it possible to get the byte from memory and write to the memory address.
However, I don't understand where the address of a pointer is stored. Let's say the value of a pointer (the address of a byte in memory) is stored somewhere in memory - how can the program know where the pointer is stored? Wouldn't that need a pointer for a pointer resulting in endless pointers for pointers for pointers... ?
UPDATE
The actual question is: "How does the compiler assign memory addresses to variables". And I found this question which points out this topic.
Thanks to everybody who's answered.
This is an implementation detail, but...
Not all addresses are stored in memory. The processor also has registers, which can be used to store addresses. There are only a handful of registers which can be used this way, maybe 16 or 32, compared to the billions of bytes you can store in memory.
Variables in registers
Some variables will get stored in registers. If you need to quickly add up some numbers, for example, the compiler might use, e.g., %eax (which is a register on x86) to accumulate the result. If optimizations are enabled, it is quite common for variables to exist only in registers. Of course, only a few variables can be in registers at any given time, so most variables will need to get written to memory at some point.
If a variable is saved to memory because there aren't enough registers, it is called "spilling". Compilers work very hard to avoid register spilling.
int func()
{
int x = 3;
return x;
// x will probably just be stored in %eax, instead of memory
}
Variables on the stack
Commonly, one register points to a special region called the "stack". So a pointer used by a function may be stored on the stack, and the address of that pointer can be calculated by doing pointer arithmetic on the stack pointer. The stack pointer doesn't have an address because it's a register, and registers don't have addresses.
void func()
{
int x = 3; // address could be "stack pointer + 8" or something like that
}
The compiler chooses the layout of the stack, giving each function a "stack frame" large enough to hold all of that function's variables. If optimization is disabled, variables will usually each get their own slot in the stack frame. With optimization enabled, slots will be reused, shared, or optimized out altogether.
Variables at fixed addresses
Another alternative is to store data at a fixed location, e.g., "address 100".
// global variable... could be stored at a fixed location, such as address 100
int x = 3;
int get_x()
{
return x; // returns the contents of address 100
}
This is actually not uncommon. Remember, that "address 100" doesn't correspond to RAM, necessarily—it is actually a virtual address referring to part of your program's virtual address space. Virtual memory allows multiple programs to all use "address 100", and that address will correspond to a different chunk of physical memory in each running program.
Absolute addresses can also be used on systems without virtual memory, or for programs which don't use virtual memory: bootloaders, operating system kernels, and software for embedded systems may use fixed addresses without virtual memory.
An absolute address is specified by the compiler by putting a "hole" in the machine code, called a relocation.
int get_x()
{
return x; // returns the contents of address ???
// Relocation: please put the address of "x" here
}
The linker then chooses the address for x, and places the address in the machine code for get_x().
Variables relative to the program counter
Yet another alternative is to store data at a location relative to the code that's being executed.
// global variable... could be stored at address 100
int x = 3;
int get_x()
{
// this instruction might appear at address 75
return x; // returns the contents of this address + 25
}
Shared libraries almost always use this technique, which allows the shared library to be loaded at whatever address is available in a program's address space. Unlike programs, shared libraries can't pick their address, because another shared library might pick the same address. Programs can also use this technique, and this is called a "position-independent executable". Programs will be position-independent on systems which lack virtual memory, or to provide additional security on systems with virtual memory, since it makes it harder to write shell code.
Just like with absolute addresses, the compiler will put a "hole" in the machine code and ask the linker to fill it in.
int get_x()
{
return x; // return the contents of here + ???
// Relocation: put the relative address of x here
}
A variable that is a pointer is still a variable, and acts like any other variable. The compiler knows where the variable is located and how to access its value. It is just that the value happens to be a memory address, that's all.
The pointer is just a variable. The only difference between this and, e.g. a long variable is that we know that what is stored in a pointer variable is a memory address instead of an integer.
Therefore, you can find the address of a pointer variable by the same way as you can find the address of any other variable. If you store this address in some other variable, this one will also have an address, of course.
You confusion seems to originate from the fact that the pointer (i.e. a variable address) can in its turn be stored. But it does not have to be stored anywhere (you only do it when you for some reason need this address). From the point of view of your program, any variable is more or less a named memory location. So the "pointer to the variable" is a named memory location that contains the value that is supposed to "point" to another memory location, hence the name "pointer".
Let's say the value of a pointer (the address of a byte in memory) is stored somewhere in memory
The address of a byte that you allocated, say like this
char ch = 'a';
is referenced by the compiler in the symbol table with the right offset. At run time, the instructions generated by the compiler will use this offset for moving it to from the primary memory to a register for some operation on it.
A pointer, in the sense you're asking, is not stored anywhere, it's just a type when you refer to a variable's address, unless you explicitly create a pointer variable to store it like this
&ch; // address of ch not stored anywhere
char *p = &ch; // now the address of ch is stored in p
Thus there's no recursion concept here.
From the compilers perspective, whether u declare a pointer or a general variable is just a memory space.When you declare a variable a certain block of memory is allocated to the variable.
The variable can be any either a general variable or a pointer.
So ultimately we have a variables (even pointers are variables only) and they have a memory location.

Different programs, same variables, same address in memory

I have two C codes.
test.c is
#include <stdlib.h>
int main ()
{
int a;
a = 5;
return a;
}
test2.c is
#include <stdlib.h>
int main ()
{
int a;
a = 6;
return a;
}
When I run them and I check the address in memory of the "a"s with gdb I get the same address. Why is this so?
Breakpoint 1, main () at test.c:7 7 return a; (gdb) print &a $1 =
(int *) 0x7fffffffe1cc
Breakpoint 1, main () at test2.c:7 7 return a; (gdb) print &a $1 =
(int *) 0x7fffffffe1cc
The address of "a" is on the stack frame for your program. This is a virtual address, independent of where in physical memory your program is actually loaded. Therefore, it would not be surprising if both (almost identical) programs used the same address.
Because each application in OS is run in its own memory space.
Address 0x7fffffffe1cc is not really physical address. This is made due to security - you cannot handle other process memory directly just like that. You also cannot handle devices directly.
You can read more about that here and here
It is very likely that your OS is using Virtual Memory for memory management. What this means is that addresses found within a given program are not 1:1 mapped to physical memory. This allows for a number of things (including running multiple programs that require lots of memory by page swapping to disk).
Without virtual memory, if you were to allocate static int a rather than put it on the stack, the linker would do it's best to choose an address for it. If you then linked another program, it doesn't know what other programs may be using that address. Running two programs could stomp on the memory of the other program. With virtual memory, each program gets it's own slice of memory with it's own address 0x0 and it's own address 0x7fffffffe1cc.

Interesting parent and child behavior while doing fork

Can someone please explain the output of the program below . Why am i getting the same value of &a for both parent and child.
They must have the different physical address.If i consider that i am getting the virtual address then how can they have same virtual address because as far as i know each physical address is uniquely bound to virtual address.
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int pid=fork();
int a=10;
if(pid==0)
{
a=a+5;
printf("%d %d\n",a,&a);
}
else
{
a=a-5;
printf("%d %d\n",a,&a);
}
return 0;
}
The child process inherits its virtual address space from the parent, even though the virtual addresses start referring to different physical addresses once the child writes to a page. That's called copy-on-write (CoW) semantics.
So, in the parent &a is mapped to some physical address. Fork initially just copies the mapping. Then, when the processes write to a, CoW kicks in and in the child process, the physical page that holds a is copied, the virtual address mapping is updated to refer to the copy and both processes have their own copy of a, at the same virtual address &a but at different physical addresses.
each physical address is uniquely bound to virtual address
That's not true. A physical memory address may be unmapped, or it may be mapped to multiple virtual addresses in one or more processes' address spaces.
Conversely, a single virtual address can be mapped to several physical address, as long as these virtual addresses exist in different processes' virtual address spaces.
[Btw., you can't reliably print memory addresses with %d (that just happens to work on 32-bit x86). Use %p instead. Also, the return type of fork is pid_t, not int.]

If different process have their own memory space, how could the address of local variable is the same? [duplicate]

This question already has answers here:
Why Virtual Memory Address is the same in different process?
(3 answers)
Closed 8 years ago.
From now on, I think after fork() is being called, the local variable is duplicated into parent process and child process, they are separated. But I try to fetch the address of each local variable in different process,it turns out that they are same:
int main(void){
int local = 10;
pid_t childPid;
childPid = fork();
if(childPid == 0 ){
printf("[Child] the local value address is %p\n",&local);
}else if(childPid < 0){
printf("there is something wrong");
}else{
printf("[Parent] the local value address is %p\n",&local);
}
return (EXIT_SUCCESS);
}
The output is:
[Parent] the local value address is 0x7fff5277baa8
[Child] the local value address is 0x7fff5277baa8
Any idea about this?
Being in a different "space" means that the "same" index point in different spaces does not refer to the same thing. Think of "spaces" as pieces of paper. "The 4th character of the 3rd line" on page 1 does not refer to the same thing as on page 2.
Because the memory space a process gets is virtual. That means the actual physical address on memory chips could be different. In the case you mentioned, local object addresses in two different processes are guaranteed to have different private physical address on memory chips.
That being said, there are circumstances when two non-local object addresses from different processes map to the same physical address. Most commonly, that could be shared library or shared memory.
If you do not specify position-indepedent-code when compiling your shared library, you really could end up same virtual address map to same physical address when two concurrent processes use this shared library.

Resources