where does address of variables stored in a memory? - c

whenever we need to find the address of the variable we use below syntax in C and it prints a address of the variable. what i am trying to understand is the address that returned is actual physical memory location or compiler throwing a some random number. if it is either physical or random, where did it get those number or where it has to be stored in memory. actually does address of the memory location takes space in the memory?
int a = 10;
printf("ADDRESS:%d",&a);
ADDRESS: 2234xxxxxxxx

This location is from the virtual address space, which is allocated to your program. In other words, this is from the virtual memory, which your OS maps to a physical memory, as and when needed.

It depends on what type of system you've got.
Low-end systems such as microcontroller applications often only supports physical addresses.
Mid-range CPUs often come with a MMU (memory mapping unit) which allows so-called virtual memory to be placed on top of the physical memory. Meaning that a certain part of the code could be working from address 0 to x, though in reality those virtual addresses are just aliases for physical ones.
High-end systems like PC typically only allows virtual memory access and denies applications direct access to physical memory. They often also use Address space layout randomization (ASLR) to produce random address layouts for certain kinds of memory, in order to prevent hacks that exploit hard-coded addresses.
In either case, the actual address itself does not take up space in memory.
Higher abstraction layer concepts such as file systems may however store addresses in look-up tables etc and then they will take up memory.

… is the address that returned is actual physical memory location or compiler throwing a some random number
In general-purpose operating systems, the addresses in your C program are virtual memory addresses.1
if it is either physical or random, where did it get those number or where it has to be stored in memory.
The software that loads your program into memory makes the final decisions about what addresses are used2, and it may inform your program about those addresses in various ways, including:
It may put the start addresses of certain parts of the program in designated processor registers. For example, the start address of the read-only data of your program might be put in R17, and then your program would use R17 as a base address for accessing that data.
It may “fix up” addresses built into your program’s instructions and data. The program’s executable file may contain information about places in your program’s instructions or data that need to be updated when the virtual addresses are decided. After the instructions and data are loaded into memory, the loader will use the information in the file to find those places and update them.
With position-independent code, the program counter itself (a register in the processor that contains the address of the instruction the processor is currently executing or about to execute) provides address information.
So, when your program wants to evaluate &x, it may take the offset of x from the start of the section it is in (and that offset is built into the program by the compiler and possibly updated by the linker) and adds it to the base address of that section. The resulting sum is the address of x.
actually does address of the memory location takes space in the memory?
The C standard does not require the program to use any memory for the address of x, &x. The result of &x is a value, like the result of 3*x. The only thing the compiler has to do with a value is ensure it gets used for whatever further expression it is used in. It is not required to store it in memory. However, if the program is dealing with many values in a piece of code, so there are not enough processor registers to hold them all, the compiler may choose to store values in memory temporarily.
Footnotes
1 Virtual memory is a conceptual or “imaginary” address space. Your program can execute with virtual addresses because the hardware automatically translates virtual addresses to physical addresses while it is executing the program. The operating system creates a map that tells the hardware how to translate virtual addresses to physical addresses. (The map may also tell the hardware certain virtual memory is not actually in physical memory at the moment. In this case, the hardware interrupts the program and starts an operating system routine which deals with the issue. That routine arranges for the needed data to be loaded into memory and then updates the virtual memory map to indicate that.)
2 There is usually a general scheme for how parts of the program are laid out in memory, such as starting the instructions in one area and setting up space for stack in another area. In modern systems, some randomness is intentionally added to the addresses to foil malicious people trying to take advantage of bugs in programs.

Related

Does a compiler have consider the kernel memory space when laying out memory?

I'm trying to reconcile a few concepts.
I know of virtual memory is shared (mapped) between the kernel and all user processes, which I read here. I also know that when the compiler generates addresses for code + data, the kernel must load them at the correct virtual addresses for that process.
To constrain the scope of the question, I'll just mean gcc when I mention 'the compiler'.
So does the compiler need to be compliant each new release of an OS, to know not to place code or data at the high memory addresses reserved for the kernel? As in, someone writing that piece of the compiler must know those details of how the kernel plans to load the program (lest the compiler put executable code in high memory)?
Or am I confusing different concepts? I got a bit confused when going through this tutorial, especially at the very bottom where it has OS code in low memory addresses, because I thought Linux uses high memory for the kernel.
The compiler doesn't determine the address ranges in memory at which things are placed. That's handled by the OS.
When the program is first executed, the loader places the various portions of the program and its libraries in memory. For memory that's allocated dynamically, large chunks are allocated from the OS and then sometimes divided into smaller chunks.
The OS loader knows where to load things. And the OS's virtual memory allocation logic how to find safe, empty spaces in the address space the process uses.
I'm not sure what you mean by the "high memory addresses reserved for the kernel". If you're talking about a 2G/2G or 3G/1G split on a 32-bit operating system, that is a fundamental design element of those OSes that use it. It doesn't change with versions.
If you're talking about high physical memory, then no. Compilers don't care about physical memory.
Linux gives each application its own memory space, distinct from the kernel. The page table contains the translations between this memory space and physical RAM, and the kernel sets up the page table so there's no interference.
That said, the compiler usually doesn't even care where the program is loaded in memory. Why would it?

memory starting location in C [duplicate]

This question already has an answer here:
Why do virtual memory addresses for linux binaries start at 0x8048000?
(1 answer)
Closed 8 years ago.
I am looking into to the memory layout of a given process. I notice that the starting memory location of each process is not 0. On this website, TEXT starts at 0x08048000. One reason can be to distinguish the address with the NULL pointer. I am just wondering if there is any another good reasons? Thanks.
The null pointer doesn't actually have to be 0. It's guaranteed in the C standard that when a 0 value is given in the context of a pointer it's treated as NULL by the compiler.
But the 0 that you use in your source code is just syntactic sugar that has no relation to the actual physical address the null-pointer value is "pointing" to.
For further details see:
Why is NULL/0 an illegal memory location for an object?
Why is address zero used for the null pointer?
An application on your operating system has its unique address space, which it sees as a continuous block of memory (the memory isn't physically continuous, it's just "the impression" the operating system gives to every program).
For the most part, each process's virtual memory space is laid out in a similar and predictable manner (this is the memory layout in a Linux process, 32-bit mode):
(image from Anatomy of a Program in Memory)
Look at the text segment (the default .text base on x86 is 0x08048000, chosen by the default linker script for static binding).
Why the magical 0x08048000? Likely because Linux borrowed that address from the System V i386 ABI.
... and why then did System V use 0x08048000?
The value was chosen to accommodate the stack below the .text section,
growing downward. The 0x48000 bytes could be mapped by the same page
table already required by the .text section (thus saving a page table
in most cases), while the remaining 0x08000000 would allow more room
for stack-hungry applications.
Is there anything below 0x08048000? There could be nothing (it's only 128M), but you can pretty much map anything you desire there, using the mmap() system call.
See also:
What's the memory before 0x08048000 used for in 32 bit machine?
Reorganizing the address space
mmap
I think this sums it up:
Each process has its own set of page tables, but there is a catch. Once virtual addresses are enabled, they apply to all software running in the machine, including the kernel itself. Thus a portion of the virtual address space must be reserved to the kernel.
So while the process gets it's own address space. Without allocating a block to the kernel, it would not be able to address kernel code and data.
This is always the first block of memory it appears and so includes address 0. The user mode space starts beyond this, and so that is where both the stack and heap reside.
Distinguishing from NULL pointer
Even if the user mode space started at address 0, there would not be any data allocated to the address 0 as that will be in the stack or the heap which themselves do not start at the beginning of the user area. Therefore NULL (with the value of 0) could be used still and is not a reason for this layout.
However one benefit related to the NULL and the first block being kernel memory is any attempt to read/write to NULL throws a Segmentation Fault.
A loader loads a binary in segments into memory: text (constants), data, code. There is no need to start from 0, and as C is has the problem from bugs accessing around null, like in a[i] that is even dangerous. This allows (on some processors) to intercept segmentation faults.
It would be the C runtime introducing a linear address space from 0. That might be imaginable where C is the operating system's implementation language. But serves no purpose; to have the heap start from 0. The memory model is one of segments. A code segment might be protected against modification by some processors.
And in segments allocation happens in C runtime managed memory blocks.
I might add, that physical 0 and upwards is often used by the operating system itself.

Can OS generate same logical Address for two different processes?

As far I know CPU generates logical address for each instruction on run time.
Now this logical address will point to linear or virtual address of the instruction.
Now my questions are ,
1) Can OS generate same logical address for two different processes ?
With reference to "In virtual memory, can two different processes have the same address?" , If two different processes can have same virtual address in that case it is also quit possible that logical addresses can also be the same.
2) Just to clarify my understanding whenever we write a complex C code or simple "hello world" code,Virtual address will be generated at build time (compile->Assemble->link) where logical address will generated by CPU at run time ?
Please clarify my doubts above and also do correct me if I am on wrong way.
The logical address and the virtual address are the same thing. The CPU translates from logical/virtual addresses to physical addresses during execution.
As such, yes, it's not just possible but quite common for two processes to use the same virtual addresses. Under a 32-bit OS this happens quite routinely, simply because the address space is fairly constrained, and there's often more physical memory than address space. But to give one well-known example, the traditional load address for Windows executables is 0x400000 (I might have the wrong number of zeros on the end, but you get the idea). That means essentially every process running on Windows would typically be loaded at that same logical/virtual address.
More recently, Windows (like most other OSes) has started to randomize the layout of executable modules in memory. Since most of a 32-bit address space is often in use, this changes the relative placement of the modules (their order in memory) but means many of the same locations are used in different processes (just for different modules in each).
A 64-bit OS has a much larger address space available, so when it's placing modules at random addresses it has many more choices available. That larger number of choices means there's a much smaller chance of the same address happening to be used in more than one process. It's probably still possible, but certainly a lot less likely.

How Process Size is determined? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am very new to these concepts but I want to ask you all a question that is very basic I think, but I am confused, So I am asking it.
The question is...
How is the size of a process determined by the OS?
Let me clear it first, suppose that I have written a C program and I want to know that how much memory it is going to take, how can I determine it? secondly I know that there are many sections like code section, data section, BSS of a process. Now does the size of these are predetermined? secondly how the size of Stack and heap are determined. does the size of stack and heap also matters while the Total size of process is calculated.
Again we say that when we load the program , an address space is given to the process ( that is done by base and limit register and controlled by MMU, I guess) and when the process tries to access a memory location that is not in its address space we get segmentation fault. How is it possible for a process to access a memory that is not in its address space. According to my understanding when some buffer overflows happens then the address gets corrupted. Now when the process wants to access the corrupted location then we get the segmentation fault. Is there any other way of Address violation.
and thirdly why the stack grows downward and heap upwards.Is this process is same with all the OS. How does it affects the performance.why can't we have it in other way?
Please correct me, if I am wrong in any of the statement.
Thanks
Sohrab
When a process is started it gets his own virtual address space. The size of the virtual address space depends on your operating system. In general 32bit processes get 4 GiB (4 giga binary) addresses and 64bit processes get 18 EiB (18 exa binary) addresses.
You cannot in any way access anything that is not mapped into your virtual address space as by definition anything that is not mapped there does not have an address for you. You may try to access areas of your virtual address space that are currently not mapped to anything, in which case you get a segfault exception.
Not all of the address space is mapped to something at any given time. Also not all of it may be mapped at all (how much of it may be mapped depends on the processor and the operating system). On current generation intel processors up to 256 TiB of your address space may be mapped. Note that operating systems can limit that further. For example for 32 bit processes (having up to 4 GiB addresses) Windows by default reserves 2 GiB for the system and 2 GiB for the application (but there's a way to make it 1 GiB for the system and 3 GiB for the application).
How much of the address space is being used and how much is mapped changes while the application runs. Operating system specific tools will let you monitor what the currently allocated memory and virtual address space is for an application that is running.
Code section, data section, BSS etc. are terms that refer to different areas of the executable file created by the linker. In general code is separate from static immutable data which is separate from statically allocated but mutable data. Stack and heap are separate from all of the above. Their size is computed by the compiler and the linker. Note that each binary file has his own sections, so any dynamically linked libraries will be mapped in the address space separately each with it's own sections mapped somewhere. Heap and stack, however, are not part of the binary image, there generally is just one stack per process and one heap.
The size of the stack (at least the initial stack) is generally fixed. Compilers and/or linkers generally have some flags you can use to set the size of the stack that you want at runtime. Stacks generally "grow backward" because that's how the processor stack instructions work. Having stacks grow in one direction and the rest grow in the other makes it easier to organize memory in situations where you want both to be unbounded but do not know how much each can grow.
Heap, in general, refers to anything that is not pre-allocated when the process starts. At the lowest level there are several logical operations that relate to heap management (not all are implemented as I describe here in all operating systems).
While the address space is fixed, some OSs keep track of which parts of it are currently reclaimed by the process. Even if this is not the case, the process itself needs to keep track of it. So the lowest level operation is to actually decide that a certain region of the address space is going to be used.
The second low level operation is to instruct the OS to map that region to something. This in general can be
some memory that is not swappable
memory that is swappable and mapped to the system swap file
memory that is swappable and mapped to some other file
memory that is swappable and mapped to some other file in read only mode
the same mapping that another virtual address region is mapped to
the same mapping that another virtual address region is mapped to, but in read only mode
the same mapping that another virtual address region is mapped to, but in copy on write mode with the copied data mapped to the default swap file
There may be other combinations I forgot, but those are the main ones.
Of course the total space used really depends on how you define it. RAM currently used is different than address space currently mapped. But as I wrote above, operating system dependent tools should let you find out what is currently happening.
The sections are predetermined by the executable file.
Besides that one, there may be those of any dynamically linked libraries. While the code and constant data of a DLL is supposed to be shared across multiple processes using it and not be counted more than once, its process-specific non-constant data should be accounted for in every process.
Besides, there can be dynamically allocated memory in the process.
Further, if there are multiple threads in the process, each of them will have its own stack.
What's more, there are going to be per-thread, per-process and per-library data structures in the process itself and in the kernel on its behalf (thread-local storage, command line params, handles to various resources, structures for those resources as well and so on and so forth).
It's difficult to calculate the full process size exactly without knowing how everything is implemented. You might get a reasonable estimate, though.
W.r.t. According to my understanding when some buffer overflows happens then the address gets corrupted. It's not necessarily true. First of all, the address of what? It depends on what happens to be in the memory near the buffer. If there's an address, it can get overwritten during a buffer overflow. But if there's another buffer nearby that contains a picture of you, the pixels of the picture can get overwritten.
You can get segmentation or page faults when trying to access memory for which you don't have necessary permissions (e.g. the kernel portion that's mapped or otherwise present in the process address space). Or it can be a read-only location. Or the location can have no mapping to the physical memory.
It's hard to tell how the location and layout of the stack and heap are going to affect performance without knowing the performance of what we're talking about. You can speculate, but the speculations can turn out to be wrong.
Btw, you should really consider asking separate questions on SO for separate issues.
"How is it possible for a process to access a memory that is not in its address space?"
Given memory protection it's impossible. But it might be attempted. Consider random pointers or access beyond buffers. If you increment any pointer long enough, it almost certainly wanders into an unmapped address range. Simple example:
char *p = "some string";
while (*p++ != 256) /* Always true. Keeps incrementing p until segfault. */
;
Simple errors like this are not unheard of, to make an understatement.
I can answer to questions #2 and #3.
Answer #2
When in C you use pointers you are really using a numerical value that is interpreted as address to memory (logical address on modern OS, see footnotes). You can modify this address at your will. If the value points to an address that is not in your address space you have your segmentation fault.
Consider for instance this scenario: your OS gives to your process the address range from 0x01000 to 0x09000. Then
int * ptr = 0x01000;
printf("%d", ptr[0]); // * prints 4 bytes (sizeof(int) bytes) of your address space
int * ptr = 0x09100;
printf("%d", ptr[0]); // * You are accessing out of your space: segfault
Mostly the causes of segfault, as you pointed out, are the use of pointers to NULL (that is mostly 0x00 address, but implementation dependent) or the use of corrupted addresses.
Note that, on linux i386, base and limit register are not used as you may think. They are not per-process limits but they point to two kind of segments: user space or kernel space.
Answer #3
The stack growth is hardware dependent and not OS dependent. On i386 assembly instruction like push and pop make the stack grow downwards with regard to stack related registers. For instance the stack pointer automatically decreases when you do a push, and increases when you do a pop. OS cannot deal with it.
Footnotes
In a modern OS, a process uses the so called logic address. This address is mapped with physical address by the OS. To have a note of this compile yourself this simply program:
#include <stdio.h>
int main()
{
int a = 10;
printf("%p\n", &a);
return 0;
}
If you run this program multiple times (even simultaneously) you would see, even for different instances, the same address printed out. Of course this is not the real memory address, but it is a logical address that will be mapped to physical address when needed.

How are the different segments like heap, stack, text related to the physical memory?

When a C program is compiled and the object file(ELF) is created. the object file contains different sections such as bss, data, text and other segments. I understood that these sections of the ELF are part of virtual memory address space. Am I right? Please correct me if I am wrong.
Also, there will be a virtual memory and page table associated with the compiled program. Page table associates the virtual memory address present in ELF to the real physical memory address when loading the program. Is my understanding correct?
I read that in the created ELF file, bss sections just keeps the reference of the uninitialised global variables. Here uninitialised global variable means, the variables that are not intialised during declaration?
Also, I read that the local variables will be allocated space at run time (i.e., in stack). Then how they will be referenced in the object file?
If in the program, there is particular section of code available to allocate memory dynamically. How these variables will be referenced in object file?
I am confused that these different segments of object file (like text, rodata, data, bss, stack and heap) are part of the physical memory (RAM), where all the programs are executed.
But I feel that my understanding is wrong. How are these different segments related to the physical memory when a process or a program is in execution?
1. Correct, the ELF file lays out the absolute or relative locations in the virtual address space of a process that the operating system should copy the ELF file contents into. (The bss is just a location and a size, since its supposed to be all zeros, there is no need to actually have the zeros in the ELF file). Note that locations can be absolute locations (like virtual address 0x100000 or relative locations like 4096 bytes after the end of text.)
2. The virtual memory definition (which is kept in page tables and maps virtual addresses to physical addresses) is not associated with a compiled program, but with a "process" (or "task" or whatever your OS calls it) that represents a running instance of that program. For example, a single ELF file can be loaded into two different processes, at different virtual addresses (if the ELF file is relocatable).
3. The programming language you're using defines which uninitialized state goes in the bss, and which gets explicitly initialized. Note that the bss does not contain "references" to these variables, it is the storage backing those variables.
4. Stack variables are referenced implicitly from the generated code. There is nothing explicit about them (or even the stack) in the ELF file.
5. Like stack references, heap references are implicit in the generated code in the ELF file. (They're all stored in memory created by changing the virtual address space via a call to sbrk or its equivalent.)
The ELF file explains to an OS how to setup a virtual address space for an instance of a program. The different sections describe different needs. For example ".rodata" says I'd like to store read-only data (as opposed to executable code). The ".text" section means executable code. The "bss" is a region used to store state that should be zeroed by the OS. The virtual address space means the program can (optionally) rely on things being where it expects when it starts up. (For example, if it asks for the .bss to be at address 0x4000, then either the OS will refuse to start it, or it will be there.)
Note that these virtual addresses are mapped to physical addresses by the page tables managed by the OS. The instance of the ELF file doesn't need to know any of the details involved in which physical pages are used.
I am not sure if 1, 2 and 3 are correct but I can explain 4 and 5.
4: They are referenced by offset from the top of the stack. When executing a function, the top of the stack is increased to allocate space for local variables. Compiler determines the order of local variables in the stack so the compiler nows what is the offset of the variables from the top of the stack.
Stack in physical memory is positioned upside down. Beginning of stack usually has highest memory address available. As programs runs and allocates space for local variables the address of the top of the stack decrements (and can potentially lead to stack overflow - overlapping with segments on lower addresses :-) )
5: Using pointers - Address of dynamically allocated variable is stored in (local) variable. This corresponds to using pointers in C.
I have found nice explanation here: http://www.ualberta.ca/CNS/RESEARCH/LinuxClusters/mem.html
All the addresses of the different sections (.text, .bss, .data, etc.) you see when you inspect an ELF with the size command:
$ size -A -x my_elf_binary
are virtual addresses. The MMU with the operating system performs the translation from the virtual addresses to the RAM physical addresses.
If you want to know these things, learn about the OS, with source code (www.kernel.org) if possible.
You need to realize that the OS kernel is actually running the CPU and managing the memory resource. And C code is just a light weight script to drive the OS and to run only simple operation with registers.
Virtual memory and Physical memory is about CPU's TLB letting the user space process to use contiguous memory virtually through the power of TLB (using page table) hardware.
So the actual physical memory, mapped to the contiguous virtual memory can be scattered to anywhere on the RAM.
Compiled program doesn't know about this TLB stuff and physical memory address stuff. They are managed in the OS kernel space.
BSS is a section which OS prepares as zero filled memory addresses, because they were not initialized in the c/c++ source code, thus marked as bss by the compiler/linker.
Stack is something prepared only a small amount of memory at first by the OS, and every time function call has been made, address will be pushed down, so that there is more space to place the local variables, and pop when you want to return from the function.
New physical memory will be allocated to the virtual address when the first small amount of memory is full and reached to the bottom, and page fault exception would occur, and the OS kernel will prepare a new physical memory and the user process can continue working.
No magic. In object code, every operation done to the pointer returned from malloc is handled as offsets to the register value returned from malloc function call.
Actually malloc is doing quite complex things. There are various implementations (jemalloc/ptmalloc/dlmalloc/googlemalloc/...) for improving dynamic allocations, but actually they are all getting new memory region from the OS using sbrk or mmap(/dev/zero), which is called anonymous memory.
Just do a man on the command readelf to find out the starting addresses of the different segments of your program.
Regarding the first question you are absolutely right. Since most of today's systems use run-time binding it is only during execution that the actual physical addresses are known. Moreover, it's the compiler and the loader that divide the program into different segments after linking the different libraries during compile and load time. Hence, the virtual addresses.
Coming to the second question it is at the run-time due to runtime binding. The third question is true. All uninitialized global variables and static variables go into BSS. Also note the special case: they go into BSS even if they are initialized to 0.
4.
If you look at a assembler code generated by gcc you can see that memory local variables is allocated in stack through command push or through changing value of the register ESP. Then they are initiated with command mov or something like that.

Resources