Does a compiler have consider the kernel memory space when laying out memory? - c

I'm trying to reconcile a few concepts.
I know of virtual memory is shared (mapped) between the kernel and all user processes, which I read here. I also know that when the compiler generates addresses for code + data, the kernel must load them at the correct virtual addresses for that process.
To constrain the scope of the question, I'll just mean gcc when I mention 'the compiler'.
So does the compiler need to be compliant each new release of an OS, to know not to place code or data at the high memory addresses reserved for the kernel? As in, someone writing that piece of the compiler must know those details of how the kernel plans to load the program (lest the compiler put executable code in high memory)?
Or am I confusing different concepts? I got a bit confused when going through this tutorial, especially at the very bottom where it has OS code in low memory addresses, because I thought Linux uses high memory for the kernel.

The compiler doesn't determine the address ranges in memory at which things are placed. That's handled by the OS.
When the program is first executed, the loader places the various portions of the program and its libraries in memory. For memory that's allocated dynamically, large chunks are allocated from the OS and then sometimes divided into smaller chunks.
The OS loader knows where to load things. And the OS's virtual memory allocation logic how to find safe, empty spaces in the address space the process uses.
I'm not sure what you mean by the "high memory addresses reserved for the kernel". If you're talking about a 2G/2G or 3G/1G split on a 32-bit operating system, that is a fundamental design element of those OSes that use it. It doesn't change with versions.
If you're talking about high physical memory, then no. Compilers don't care about physical memory.

Linux gives each application its own memory space, distinct from the kernel. The page table contains the translations between this memory space and physical RAM, and the kernel sets up the page table so there's no interference.
That said, the compiler usually doesn't even care where the program is loaded in memory. Why would it?

Related

What is the main origin of heap and stack memory division?

I read a lot of explanation of heap and stack memory, and all of them obscure anyway in terms of origin. First of all I understand how this memories works with software, but I don't understand the main source of this division. I assume that they are the same unspecialized physical memory, but...
For example say we have PC without any OS, and we want create some bootable program with assembly language for x86. I assume we can do this (Personally I don't know assembly, but some people write OS anyway). So the main question is Can we already operate with heap and stack, or we must create some memory managment machinery for this? If yes, so how it can be possible in terms of bare metal programming?
Adding something to the other answer, fairly correct but perhaps not very complete.
Heap and stack are two (software) ways to "manage" memory. The physical memory, normally, is a flat array of cells where a program can read and write. It is up to the running program to use those cells as it wants. But there is more to say.
1^ thing. Heap is totally software, while stack is also (or mainly) a hardware thing. Most processors have hardware (or CPU instruction) to support the stack, while most (or all?) don't care about the heap. Even more: there are small embedded processors (or microcontrollers) which have a separated stack area - totally different from other ram areas where the program could create a "heap".
2^ thing. Whean speaking about "programs", one can/should think that the operating system (the OS) is a program, specialized in managing resources (memory included), and extendable with "applications" (which are programs). In such scenario, stack and heap are managed in cooperation from both OS and the applications.
So, to reply to your main question, the 90% correct answer is: in bare metal you have already a stack - perhaps you have to issue some short instruction to set it up, but it is straightforward. But you don't have a heap, you must implement it in your program. First you set aside some memory to be used as a stack; and then you can set aside some more memory to be used as a heap, not forgetting that you must preserve some memory for normal/static data. The part of the program that manages the heap should know what to do, using but not erratically overwriting the stack and the static data, to perform its functions.

Where exactly are the variables stored in a C program?

I am new to computer programming. I was studying about variables and came across a definition on the internet:
Variables are the names you give to computer memory locations which are used to store values in a computer program.
What are these memory locations? Do these locations refer to the actual computer memory or this is just a dump in the program itself from where it calls those variables later when we need them?
Also there are other terms that I encountered here on stack overflow like heap and stack. I could not get my head around these. Please help.
The way you've asked the question suggests you expect a single answer. That is simply not the case.
In a rough sense, all variables will exist in memory while your program is being executed. The memory your variables exist in depends both on several things.
Modern computer hardware often has quite a complex physical memory architecture - with multiple levels of cache (in both the CPU, and various peripheral devices), a number of CPU registers, shared memory, different types of RAM, storage devices, EEPROMs, etc. Different systems have these types of memory - and more types - in different proportions.
Operating systems may make memory available to your program in different ways. For example, it may provide virtual memory, using a combination of RAM and reserved hard drive space (and managing mappings, so your program can't tell the difference). This can allow your program to use more memory than is physically available as RAM, but also affects performance, since the operating system must swap memory usage of your program between RAM and the hard drive (which is typically orders of magnitude slower).
A lot of compilers and libraries are implemented to maximise your programs performance (by various measures) - compiler optimisation of your code (which can cause some variables in your code to not even exist when your program is run), library functions crafted for performance, etc. One consequence of this is that the compiler, or library, may use memory in different ways (e.g. some implementations may embed code in your executable to detect memory resources available when the program is run, others may simply assume a fixed amount of RAM), and the usage may even vary over time.

Virtual/Logical Memory and Program relocation

Virtual memory along with logical memory helps to make sure programs do not corrupt each others data.
Program relocation does an almost similar thing of making sure that multiple programs does not corrupt each other.Relocation modifies object program so that it can be loaded at a new, alternate address.
How are virtual memory, logical memory and program relocation related ? Are they similar ?
If they are same/similar, then why do we need program relocation ?
Relocatable programs, or said another way position-independent code, is traditionally used in two circumstances:
systems without virtual memory (or too basic virtual memory, e.g. classic MacOS), for any code
for dynamic libraries, even on systems with virtual memory, given that a dynamic library could find itself lodaded on an address that is not its preferred one if other code is already at that space in the address space of the host program.
However, today even main executable programs on systems with virtual memory tend to be position-independent (e.g. the PIE* build flag on Mac OS X) so that they can be loaded at a randomized address to protect against exploits, e.g. those using ROP**.
* Position Independent Executable
** Return-Oriented Programming
Virtual memory does not prevent programs from interfering with out other. It is logical memory that does so. Unfortunately, it is common for the two concepts to be conflated to "virtual memory."
There are two types of relocation and it is not clear which you are referring to. However, they are connected. On the other hand, the concept is not really related to virtual memory.
The first concept of relocatable code. This is critical for shared libraries that usually have to be mapped to different addresses.
Relocatable code uses offsets rather than absolute addresses. When a program results in an instruction sequence something like:
JMP SOMELABEL
. . .
SOMELABEL:
The computer or assembler encodes this as
JUMP the-number-of-bytes-to-SOMELABEL
rather than
JUMP to-the-address-of-somelabel.
By using offsets the code works the same way no matter where the JMP instruction is located.
The second type of relocation uses the first. In the past relocation was mostly used for libraries. Now, some OS's will load program segments at different places in memory. That is intended for security. It is designed to keep malicious cracks that depend upon the application being loaded at a specific address.
Both of these concepts work with or without virtual memory.
Note that generally the program is not modified to relocated it. I generally, because an executable file will usually have some addresses that need to be fixed up at run time.

Virtual Memory and Relocatable Code

In a 32 bit system, each process virtually has 2^32 bytes of CONTIGUOUS address space. So why the final executable code generated by a linker needs to be relocatable. What is the requirement since all addresses generated would be virtual addresses in the process's own address space and other process CANNOT use the same.
Hence the process can be placed in anywhere it wants to be. Why relocatable?
Some operating systems make the executable code relocatable (this is definitely not universal to all operating systems) to allow for address space layout randomization. This helps mitigate certain attacks.
In the past when stacks were executable a buffer overflow could be exploited by writing executable code directly on the overflowed stack or heap. As operating systems became smarter and started preventing execution of the stack and the heap, attacks became more sophisticated and started using known code sequences in memory by doing return oriented programming. The mitigation to that class of attacks was first done by randomizing the memory layout for shared libraries (since those were easier to exploit) and then when attackers switched to attacking the main executable, by randomizing the memory position of the executable. To make it possible the main executable needs to be relocatable.
Executable code does not always contain relative addresses. On Windows, for example, addressing is often absolute (e.g. for global data).
Consider two different dynamic libraries. Both were compiled for a fixed base address of 0x00100000. Your program tries to load both of them. Where is the loader to place the 2nd DLL? Its preferred base address is already used by the other DLL.
In this case relocatable code helps placing the 2nd DLL at a different address and patching its internal pointers to the new location. With fixed base addresses, loading the 2nd DLL would just fail.
It needs to be relocatable because in order to execute your process needs to be put into the actual main memory in a ready queue. Now where in the main memory it shall be placed is not fixed (it is placed wherever sufficient space is available) so the actual addresses of the instructions varies from its virtual address .
Hence statements making calls to functions ,returns etc need to be updated accordingly pointing to the actual address of those functions

Microcontroller memory allocation

I've been thinking for day about the following question:
In a common pc when you allocate some memory, you ask for it to the OS that keeps track of which memory segments are occupied and which ones are not, and don't let you mess around with other programs memory etc.
But what about a microcontroller, I mean a microcontroller doesn't have an operating system running so when you ask for a bunch of memory what is going on? you cannot simply acess the memory chip and acess a random place cause it may be occupied... who keeps track of which parts of memory are already occupied, and gives you a free place to store something?
EDIT:
I've programmed microcontrollers in C... and I was thinking that the answer could be "language independent". But let me be more clear: supose i have this program running on a microcontroller:
int i=0;
int d=3;
what makes sure that my i and d variables are not stored at the same place in memory?
I think the comments have already covered this...
To ask for memory means you have some operating system managing memory that you are mallocing from (using a loose sense of the term operating system). First you shouldnt be mallocing memory in a microcontroller as a general rule (I may get flamed for that statement). Can be done in cases but you are in control of your memory, you own the system with your application, asking for memory means asking yourself for it.
Unless you have reasons why you cannot statically allocate your structures or arrays or use a union if there are mutually exclusive code paths that might both want much or all of the spare memory, you can try to allocate dynamically and free but it is a harder system engineering problem to solve.
There is a difference between runtime allocation of memory and compile time. your example has nothing to do with the rest of the question
int i=0;
int d=3;
the compiler at compile time allocates two locations in .data one for each of those items. the linker and/or script manages where .data lives and what its limitations are on size, if .data is bigger than what is available you should get a linker warning, if not then you need to fix your linker commands or script to match your system.
runtime allocation is managed at runtime and where and how it manages the memory is determined by that library, even if you have plenty of memory a bad or improperly written library could overlap .text, .data, .bss and/or the stack and cause a lot of problems.
excessive use of the stack is also a pretty serious system engineering problem which coming from non-embedded systems is these days often overlooked because there is so much memory. It is a very real problem when dealing with embedded code on a microcontroller. You need to know your worst case stack usage, leave room for at least that much memory if you are going to have a heap to dynamically allocate, or even if you statically allocate.

Resources