What segments does C compiled program use? - c

I read on OSDev wiki, that protected mode of x86 architecture allow you to create separate segments for code and data, while you cannot write into code section. That Windows (yes, this is the platform) loads new code into code segment, and data are created on data segment. But, if this is the case, how does program know it must switch segments to the data segment? Becouse if I understand it right, all adress instructions point to the segment you run the code from, unless you switch the descriptor. But I also read, that so colled flat memory model allows you to run code and data within one segment. But I read this only in connection to assembler. So, please, what is the case with C compiled code on Windows? Thanks.

There are two meanings for segment in the explanation:
an 8086 memory address segment
an object module program section segment
The first is related to what is loaded into an 80386+ segment register; it contains a physical memory start address, memory allocation length, permitted read/write/execute access, and whether it grows from low to high or vice versa (plus some more obscure flags, like "copy on reference").
The second meaning is part of the object module language. Basically, there is a segment named code, a segment named data (which contains initialized data), and segment for uninitialized data named bss (named for the pseudo instructions of 1960s assemblers meaning Block Starting with Symbol). When the linker combines object modules, it arranges all the code segments together, all the data segments together elsewhere, and the bss together as well. When the loader maps memory addresses it looks at the total code space and allocates a CPU memory allocation of at least that size, and maps the segment to the code (in a virtual memory situation) or reads the code into the allocated memory—for which it has to temporarily set the memory as data writable. The write-protection is done through the CPU's paging mechanism, as well as the segment register. This is to protect code writing attempts through, for example, an errant data address. The loader also does the similar setup for the two data segment groups. (Besides those, there is setting up a stack segment and allocating it, and mapping shared images.)
As far as the x86 executing instructions, each operand has an associated segment register. Sometimes these are explicit, and sometimes they are implicit. Code is implicitly accessed through CS, stack through SS which is implied whenever the ESP or EBP register is involved, and DS is implied for most other operands. ES, FS, and GS must be specified as an override in all other cases, except for some of the string instructions like movs and cmps. In flat model, all the segment registers map to the same address space, though CS doesn't allow writing.
So, to answer your last question, the CPU has four (or more) segment registers set up at once to access the flat virtual memory space of the process. Each operand access is checked for being appropriate to the instruction (like not incrementing a CS address) and also is checked by the paging protection unit for being allowed.

The info you read is outdated. Windows versions since ~1993 use a flat 32-bit virtual memory space. The values of the CS and DS segment registers no longer matter and cannot be changed. There is still a notion of code vs data, now implemented by memory page attributes. Review the allowed values passed in the flNewProtect argument for the VirtualProtectEx() API function.
You very rarely use this API yourself, the attributes are set by the executable image loader and the heap manager.

Related

where is it documented that global array in C, compiled by gcc, is initialized like "copy-on-write"?

For this C code:
foobar.c:
static int array[256];
int main() {
return 0;
}
the array is initialized to all 0's, by the C standard. However, when I compile
gcc -S foobar.c
this produces the assembly code foobar.s that I can inspect, and nowhere in foobar.s, is there any initialization of contents of the array.
Hence I reason, that the contents are not initialized, only when an element of the array is inspected, is it initialized, kind of like "copy-on-write" mechanism for fork.
Is my reasoning correct? If so, is this a documented feature, and if so where can I find that documentation?
There's kind of a lot of levels here. This answer addresses Linux in particular, but the same concepts are likely to apply on other systems, possibly with different names.
The compiler requires that the object be "zero initialized". In other words, when a memory read instruction is executed with an address in that range, the value that it reads must be zero. As you say, this is necessary to achieve the behavior dictated by the C standard.
The compiler accomplishes this by asking the assembler to fill the space with zeros, one way or another. It may use the .space or .zero directive which implicitly requests this. It will also place the object in a section with the special name .bss (the reasons for this name are historical). If you look further up in the assembly output, you should see a directive like .bss or .section .bss. The assembler and linker promises that this entire section will be (somehow) initialized to zero. This is documented:
The bss section is used for local common variable storage. You may allocate address space in the bss section, but you may not dictate data to load into it before your program executes. When your program starts running, all the contents of the bss section are zeroed bytes.
Okay, so now what do the assembler and linker do to make it happen? Well, an ELF executable file has a segment header, which specifies how and where code and data from the file should be mapped into the program's memory. (Please note that the use of the word "segment" here has nothing to do with the x86 memory segmentation model or segment registers, and is only vaguely related to the term "segmentation fault".) The size of the segment, and the amount of data to be mapped, are specified separately. If the size is greater, then all remaining bytes are to be initialized to zero. This is also documented in the above-linked man page:
PT_LOAD
The array element specifies a loadable segment,
described by p_filesz and p_memsz. The bytes
from the file are mapped to the beginning of the
memory segment. If the segment's memory size
p_memsz is larger than the file size p_filesz,
the "extra" bytes are defined to hold the value
0 and to follow the segment's initialized area.
So the linker ensures that the ELF executable contains such a segment, and that all objects in the .bss section are in this segment, but not within the part that is mapped to the file.
Once all this is done, then the observable behavior is guaranteed: as above, when an instruction attempts to read from this object before it has been written, the value it reads will be zero.
Now as to how that behavior is ensured at runtime: that is the job of the kernel. It could do it by pre-allocating actual physical memory for that range of virtual addresses, and filling it with zeros. Or by an "allocate on demand" method, like what you describe, by leaving those pages unmapped in the CPU's page tables. Then any access to those pages by the application will cause a page fault, which will be handled by the kernel, which will allocate zero-filled physical memory at that time, and then restart the faulting instruction. This is completely transparent to the application. It just sees that the read instruction got the value zero. If there was a page fault, then it just seems to the application like the read instruction took a long time to execute.
The kernel normally uses the "on demand" method, because it is more efficient in case not all of the "zero initialized" memory is actually used. But this is not going to be documented as guaranteed behavior; it is an implementation detail. An application programmer need not care, and in fact must not care, how it works under the hood. If the Linux kernel maintainers decide tomorrow to switch everything to the pre-allocate method, every application will work exactly as it did before, just maybe a little faster or slower.

Big empty space in memory?

Im very new to embedded programming started yesterday actually and Ive noticed something I think is strange. I have a very simple program doing nothing but return 0.
int main() {
return 0;
}
When I run this in IAR Embedded Workbench I have a memory view showing me the programs memory. Ive noticed that in the memory there is some memory but then it is a big block of empty space and then there is memory again (I suck at explaining :P so here is an image of the memory)
Please help me understand this a little more than I do now. I dont really know what to search for because Im so new to this.
The first two lines are the 8 interrupt vectors, expressed as 32-bit instructions with the highest byte last. That is, read them in groups of 4 bytes, with the highest byte last, and then convert to an instruction via the usual method. The first few vectors, including the reset at memory location 0, turn out to be LDR instructions, which load an immediate address into the PC register. This causes the processor to jump to that address. (The reset vector is also the first instruction to run when the device is switched on.)
You can see the structure of an LDR instruction here, or at many other places via an internet search. If we write the reset vector 18 f0 95 e5 as e5 95 f0 18, then we see that the PC register is loaded with the address located at an offset of 0x20.
So the next two lines are memory locations referred to by instructions in the first two lines. The reset vector sends the PC to 0x00000080, which is where the C runtime of your program starts. (The other vectors send the PC to 0x00000170 near the end of your program. What this instruction is is left to the reader.)
Typically, the C runtime is code added to the front of your program that loads the global variables into RAM from flash, and sets the uninitialized RAM to 0. Your program starts after that.
Your original question was: why have such a big gap of unused flash? The answer is that flash memory is not really at a premium, so we can waste a little, and that having extra space there allows for forward-compatibility. If we need to increase the vector table size, then we don't need to move the code around. In fact, this interrupt model has been changed in the new ARM Cortex processors anyway.
Physical (not virtual) memory addresses map to physical circuits. The lowest addresses often map to registers, not RAM arrays. In the interest of consistency, a given address usually maps to the same functionality on different processors of the same family, and missing functionality appears as a small hole in the address mapping.
Furthermore, RAM is assigned to a contiguous address range, after all the I/O registers and housekeeping functions. This produces a big hole between all the registers and the RAM.
Alternately, as #Martin suggests, it may represent uninitialized and read-only Flash memory as -- bytes. Unlike truly unassigned addresses, access to this is unlikely to produce an exception, and you might even be able to make them "reappear" using appropriate Flash controller commands.
On a modern desktop-class machine, virtual memory hides all this from you, and even parts of the physical address map may be configurable. Many embedded-class processors allow configuration to the extent of specifying the location of the interrupt vector table.
UncleO is right but here is some additional information.
The project's linker command file (*.icf for IAR EW) determines where sections are located in memory. (Look under Project->Options->Linker->Config to identify your linker configuration file.) If you view the linker command file with a text editor you may be able to identify where it locates a section named .intvec (or similar) at address 0x00000000. And then it may locate another section (maybe .text) at address 0x00000080.
You can also see these memory sections identified in the .map file, along with their locations. (Ensure "Generate linker map file" is checked under Project->Options->Linker->List.) The map file is an output from the build, however, and it's the linker command file that determines the locations.
So that space in memory is there because the linker command file instructed it to be that way. I'm not sure whether that space is necessary but it's certainly not a problem. You might be able to experiment with the linker command file and move that second section around. But the exception table (a.k.a. interrupt vector table) must be located at 0x00000000. And you'll want to ensure that the reset vector points to the new location of the startup code if you move it.

How does OS execute binary files in virtual memory?

For example in my program I called a function foo(). The compiler and assembler would eventually write jmp someaddr in the binary. I know the concept of virtual memory. The program would think that it has the whole memory at disposal, and the start position is 0x000. In this way the assembler can calculate the position of foo().
But in fact this is not decided until runtime right? I have to run the program to know where I loaded the program into, hence the address of the jmp. But when the program actually runs, how does the OS come in and change the address of the jmp? These are direct CPU instructions right?
This question can't be answered in general because it's totally hardware and OS dependent. However a typical answer is that the initially loaded program can be compiled as you say: Because the VM hardware gives each program its own address space, all addresses can be fixed when the program is linked. No recalculation of addresses at load time is needed.
Things get much more interesting with dynamically loaded libraries because two used by the same initially loaded program might be compiled with the same base address, so their address spaces overlap.
One approach to this problem is to require Position Independent Code in DLLs. In such code all addresses are relative to the code itself. Jumps are usually relative to the PC (though a code segment register can also be used). Data are also relative to some data segment or base register. To choose the runtime location, the PIC code itself needs no change. Only the segment or base register(s) need(s) be set whenever in the prelude of every DLL routine.
PIC tends to be a bit slower than position dependent code because there's additional address arithmetic and the PC and/or base registers can bottleneck the processor's instruction pipeline.
So the other approach is for the loader to rebase the DLL code when necessary to eliminate address space overlaps. For this the DLL must include a table of all the absolute addresses in the code. The loader computes an offset between the assumed code and data base addresses and actual, then traverses the table, adding the offset to each absolute address as the program is copied into VM.
DLLs also have a table of entry points so that the calling program knows where the library procedures start. These must be adjusted as well.
Rebasing is not great for performance either. It slows down loading. Moreover, it defeats sharing of DLL code. You need at least one copy per rebase offset.
For these reasons, DLLs that are part of Windows are deliberately compiled with non-overlapping VM address spaces. This speeds loading and allows sharing. If you ever notice that a 3rd party DLL crunches the disk and loads slowly, while MS DLLs like the C runtime library load quickly, you are seeing the effects of rebasing in Windows.
You can infer more about this topic by reading about object file formats. Here is one example.
Position-independent code is code that you can run from any address. If you have a jmp instruction in position-independent code, it will often be a relative jump, which jumps to an offset from the current location. When you copy the code, it won't change the offsets between parts of the code so it will still work.
Relocatable code is code that you can run from any address, but you might have to modify the code first (maybe you can't just copy it). The code will contain a relocation table which tells how it needs to be modified.
Non-relocatable code is code that must be loaded at a certain address or it will not work.
Each program is different, it depends on how the program was written, or the compiler settings, or other various factors.
Shared libraries are usually compiled as position-independent code, which allows the same library to be loaded at different locations in different processes, without having to load multiple copies into memory. The same copy can be shared between processes, even though it is at a different address in each process.
Executables are often non-relocatable, but they can be position-independent. Virtual memory allows each program to have the entire address space (minus some overhead) to itself, so each executable can choose the address at which it's loaded without worrying about collisions with other executables. Some executables are position-independent, which can be used to increase security (ASLR).
Object files and static libraries are usually relocatable code. The linker will relocate them when combining them to create a shared library, executable, or other image.
Boot loaders and operating system kernels are almost always non-relocatable.
Yes, it is at runtime. The operating system, the part managing starting and switching tasks is ideally at a different protection level, it has more power. It knows what memory is in use and allocates some for the new task. It configures the mmu so that the new task has a virtual address space starting at zero or whatever the rule is for that operating system and processor. How you get into user mode at that starting address, is very processor specific.
One method for example is the hardware might save some state not just address but mode or virtual id or something when an interrupt occurs, lets say on the stack. And the return from interrupt instruction as defined by that processor takes the address, and state/mode, off of the stack and switches there (causing lets assume the mmu to react to its next fetch based on the new mode not the old). For a processor that works like that then you may be able to fake an interrupt return by placing the right items on the stack such that when you kick the interrupt return instruction it basically does a jump with additional features of mode switching, etc.
The ARM family for example (not cortex-m) has a processor state register for what you are running now (in the case of an interrupt or service call) and a second state register for where you came from, the state that was interrupted, when you do the proper return you give it the address and it switches back to that mode using the other register. You can directly access that register from the non-users modes so you can manipulate the state of the return. There is no return instruction in arm, just flavors of jump (modifications to the program counter), so it is a special kind of jump.
The short answer is that it is very specific to the processor as to what your choices are for jumping to the first time or returning to after a task switch to a running task in an application mode in a virtual address space. Either directly or indirectly the processor documentation will describe these modes and how you change them. If not explicitly described then you have to figure out on your own from the instructions and the mmu protections and such how to switch tasks.

How are the different segments like heap, stack, text related to the physical memory?

When a C program is compiled and the object file(ELF) is created. the object file contains different sections such as bss, data, text and other segments. I understood that these sections of the ELF are part of virtual memory address space. Am I right? Please correct me if I am wrong.
Also, there will be a virtual memory and page table associated with the compiled program. Page table associates the virtual memory address present in ELF to the real physical memory address when loading the program. Is my understanding correct?
I read that in the created ELF file, bss sections just keeps the reference of the uninitialised global variables. Here uninitialised global variable means, the variables that are not intialised during declaration?
Also, I read that the local variables will be allocated space at run time (i.e., in stack). Then how they will be referenced in the object file?
If in the program, there is particular section of code available to allocate memory dynamically. How these variables will be referenced in object file?
I am confused that these different segments of object file (like text, rodata, data, bss, stack and heap) are part of the physical memory (RAM), where all the programs are executed.
But I feel that my understanding is wrong. How are these different segments related to the physical memory when a process or a program is in execution?
1. Correct, the ELF file lays out the absolute or relative locations in the virtual address space of a process that the operating system should copy the ELF file contents into. (The bss is just a location and a size, since its supposed to be all zeros, there is no need to actually have the zeros in the ELF file). Note that locations can be absolute locations (like virtual address 0x100000 or relative locations like 4096 bytes after the end of text.)
2. The virtual memory definition (which is kept in page tables and maps virtual addresses to physical addresses) is not associated with a compiled program, but with a "process" (or "task" or whatever your OS calls it) that represents a running instance of that program. For example, a single ELF file can be loaded into two different processes, at different virtual addresses (if the ELF file is relocatable).
3. The programming language you're using defines which uninitialized state goes in the bss, and which gets explicitly initialized. Note that the bss does not contain "references" to these variables, it is the storage backing those variables.
4. Stack variables are referenced implicitly from the generated code. There is nothing explicit about them (or even the stack) in the ELF file.
5. Like stack references, heap references are implicit in the generated code in the ELF file. (They're all stored in memory created by changing the virtual address space via a call to sbrk or its equivalent.)
The ELF file explains to an OS how to setup a virtual address space for an instance of a program. The different sections describe different needs. For example ".rodata" says I'd like to store read-only data (as opposed to executable code). The ".text" section means executable code. The "bss" is a region used to store state that should be zeroed by the OS. The virtual address space means the program can (optionally) rely on things being where it expects when it starts up. (For example, if it asks for the .bss to be at address 0x4000, then either the OS will refuse to start it, or it will be there.)
Note that these virtual addresses are mapped to physical addresses by the page tables managed by the OS. The instance of the ELF file doesn't need to know any of the details involved in which physical pages are used.
I am not sure if 1, 2 and 3 are correct but I can explain 4 and 5.
4: They are referenced by offset from the top of the stack. When executing a function, the top of the stack is increased to allocate space for local variables. Compiler determines the order of local variables in the stack so the compiler nows what is the offset of the variables from the top of the stack.
Stack in physical memory is positioned upside down. Beginning of stack usually has highest memory address available. As programs runs and allocates space for local variables the address of the top of the stack decrements (and can potentially lead to stack overflow - overlapping with segments on lower addresses :-) )
5: Using pointers - Address of dynamically allocated variable is stored in (local) variable. This corresponds to using pointers in C.
I have found nice explanation here: http://www.ualberta.ca/CNS/RESEARCH/LinuxClusters/mem.html
All the addresses of the different sections (.text, .bss, .data, etc.) you see when you inspect an ELF with the size command:
$ size -A -x my_elf_binary
are virtual addresses. The MMU with the operating system performs the translation from the virtual addresses to the RAM physical addresses.
If you want to know these things, learn about the OS, with source code (www.kernel.org) if possible.
You need to realize that the OS kernel is actually running the CPU and managing the memory resource. And C code is just a light weight script to drive the OS and to run only simple operation with registers.
Virtual memory and Physical memory is about CPU's TLB letting the user space process to use contiguous memory virtually through the power of TLB (using page table) hardware.
So the actual physical memory, mapped to the contiguous virtual memory can be scattered to anywhere on the RAM.
Compiled program doesn't know about this TLB stuff and physical memory address stuff. They are managed in the OS kernel space.
BSS is a section which OS prepares as zero filled memory addresses, because they were not initialized in the c/c++ source code, thus marked as bss by the compiler/linker.
Stack is something prepared only a small amount of memory at first by the OS, and every time function call has been made, address will be pushed down, so that there is more space to place the local variables, and pop when you want to return from the function.
New physical memory will be allocated to the virtual address when the first small amount of memory is full and reached to the bottom, and page fault exception would occur, and the OS kernel will prepare a new physical memory and the user process can continue working.
No magic. In object code, every operation done to the pointer returned from malloc is handled as offsets to the register value returned from malloc function call.
Actually malloc is doing quite complex things. There are various implementations (jemalloc/ptmalloc/dlmalloc/googlemalloc/...) for improving dynamic allocations, but actually they are all getting new memory region from the OS using sbrk or mmap(/dev/zero), which is called anonymous memory.
Just do a man on the command readelf to find out the starting addresses of the different segments of your program.
Regarding the first question you are absolutely right. Since most of today's systems use run-time binding it is only during execution that the actual physical addresses are known. Moreover, it's the compiler and the loader that divide the program into different segments after linking the different libraries during compile and load time. Hence, the virtual addresses.
Coming to the second question it is at the run-time due to runtime binding. The third question is true. All uninitialized global variables and static variables go into BSS. Also note the special case: they go into BSS even if they are initialized to 0.
4.
If you look at a assembler code generated by gcc you can see that memory local variables is allocated in stack through command push or through changing value of the register ESP. Then they are initiated with command mov or something like that.

Fixed address variable in C

For embedded applications, it is often necessary to access fixed memory locations for peripheral registers. The standard way I have found to do this is something like the following:
// access register 'foo_reg', which is located at address 0x100
#define foo_reg *(int *)0x100
foo_reg = 1; // write to foo_reg
int x = foo_reg; // read from foo_reg
I understand how that works, but what I don't understand is how the space for foo_reg is allocated (i.e. what keeps the linker from putting another variable at 0x100?). Can the space be reserved at the C level, or does there have to be a linker option that specifies that nothing should be located at 0x100. I'm using the GNU tools (gcc, ld, etc.), so am mostly interested in the specifics of that toolset at the moment.
Some additional information about my architecture to clarify the question:
My processor interfaces to an FPGA via a set of registers mapped into the regular data space (where variables live) of the processor. So I need to point to those registers and block off the associated address space. In the past, I have used a compiler that had an extension for locating variables from C code. I would group the registers into a struct, then place the struct at the appropriate location:
typedef struct
{
BYTE reg1;
BYTE reg2;
...
} Registers;
Registers regs _at_ 0x100;
regs.reg1 = 0;
Actually creating a 'Registers' struct reserves the space in the compiler/linker's eyes.
Now, using the GNU tools, I obviously don't have the at extension. Using the pointer method:
#define reg1 *(BYTE*)0x100;
#define reg2 *(BYTE*)0x101;
reg1 = 0
// or
#define regs *(Registers*)0x100
regs->reg1 = 0;
This is a simple application with no OS and no advanced memory management. Essentially:
void main()
{
while(1){
do_stuff();
}
}
Your linker and compiler don't know about that (without you telling it anything, of course). It's up to the designer of the ABI of your platform to specify they don't allocate objects at those addresses.
So, there is sometimes (the platform i worked on had that) a range in the virtual address space that is mapped directly to physical addresses and another range that can be used by user space processes to grow the stack or to allocate heap memory.
You can use the defsym option with GNU ld to allocate some symbol at a fixed address:
--defsym symbol=expression
Or if the expression is more complicated than simple arithmetic, use a custom linker script. That is the place where you can define regions of memory and tell the linker what regions should be given to what sections/objects. See here for an explanation. Though that is usually exactly the job of the writer of the tool-chain you use. They take the spec of the ABI and then write linker scripts and assembler/compiler back-ends that fulfill the requirements of your platform.
Incidentally, GCC has an attribute section that you can use to place your struct into a specific section. You could then tell the linker to place that section into the region where your registers live.
Registers regs __attribute__((section("REGS")));
A linker would typically use a linker script to determine where variables would be allocated. This is called the "data" section and of course should point to a RAM location. Therefore it is impossible for a variable to be allocated at an address not in RAM.
You can read more about linker scripts in GCC here.
Your linker handles the placement of data and variables. It knows about your target system through a linker script. The linker script defines regions in a memory layout such as .text (for constant data and code) and .bss (for your global variables and the heap), and also creates a correlation between a virtual and physical address (if one is needed). It is the job of the linker script's maintainer to make sure that the sections usable by the linker do not override your IO addresses.
When the embedded operating system loads the application into memory, it will load it in usually at some specified location, lets say 0x5000. All the local memory you are using will be relative to that address, that is, int x will be somewhere like 0x5000+code size+4... assuming this is a global variable. If it is a local variable, its located on the stack. When you reference 0x100, you are referencing system memory space, the same space the operating system is responsible for managing, and probably a very specific place that it monitors.
The linker won't place code at specific memory locations, it works in 'relative to where my program code is' memory space.
This breaks down a little bit when you get into virtual memory, but for embedded systems, this tends to hold true.
Cheers!
Getting the GCC toolchain to give you an image suitable for use directly on the hardware without an OS to load it is possible, but involves a couple of steps that aren't normally needed for normal programs.
You will almost certainly need to customize the C run time startup module. This is an assembly module (often named something like crt0.s) that is responsible initializing the initialized data, clearing the BSS, calling constructors for global objects if C++ modules with global objects are included, etc. Typical customizations include the need to setup your hardware to actually address the RAM (possibly including setting up the DRAM controller as well) so that there is a place to put data and stack. Some CPUs need to have these things done in a specific sequence: e.g. The ColdFire MCF5307 has one chip select that responds to every address after boot which eventually must be configured to cover just the area of the memory map planned for the attached chip.
Your hardware team (or you with another hat on, possibly) should have a memory map documenting what is at various addresses. ROM at 0x00000000, RAM at 0x10000000, device registers at 0xD0000000, etc. In some processors, the hardware team might only have connected a chip select from the CPU to a device, and leave it up to you to decide what address triggers that select pin.
GNU ld supports a very flexible linker script language that allows the various sections of the executable image to be placed in specific address spaces. For normal programming, you never see the linker script since a stock one is supplied by gcc that is tuned to your OS's assumptions for a normal application.
The output of the linker is in a relocatable format that is intended to be loaded into RAM by an OS. It probably has relocation fixups that need to be completed, and may even dynamically load some libraries. In a ROM system, dynamic loading is (usually) not supported, so you won't be doing that. But you still need a raw binary image (often in a HEX format suitable for a PROM programmer of some form), so you will need to use the objcopy utility from binutil to transform the linker output to a suitable format.
So, to answer the actual question you asked...
You use a linker script to specify the target addresses of each section of your program's image. In that script, you have several options for dealing with device registers, but all of them involve putting the text, data, bss stack, and heap segments in address ranges that avoid the hardware registers. There are also mechanisms available that can make sure that ld throws an error if you overfill your ROM or RAM, and you should use those as well.
Actually getting the device addresses into your C code can be done with #define as in your example, or by declaring a symbol directly in the linker script that is resolved to the base address of the registers, with a matching extern declaration in a C header file.
Although it is possible to use GCC's section attribute to define an instance of an uninitialized struct as being located in a specific section (such as FPGA_REGS), I have found that not to work well in real systems. It can create maintenance issues, and it becomes an expensive way to describe the full register map of the on-chip devices. If you use that technique, the linker script would then be responsible for mapping FPGA_REGS to its correct address.
In any case, you are going to need to get a good understanding of object file concepts such as "sections" (specifically the text, data, and bss sections at minimum), and may need to chase down details that bridge the gap between hardware and software such as the interrupt vector table, interrupt priorities, supervisor vs. user modes (or rings 0 to 3 on x86 variants) and the like.
Typically these addresses are beyond the reach of your process. So, your linker wouldn't dare put stuff there.
If the memory location has a special meaning on your architecture, the compiler should know that and not put any variables there. That would be similar to the IO mapped space on most architectures. It has no knowledge that you're using it to store values, it just knows that normal variables shouldn't go there. Many embedded compilers support language extensions that allow you to declare variables and functions at specific locations, usually using #pragma. Also, generally the way I've seen people implement the sort of memory mapping you're trying to do is to declare an int at the desired memory location, then just treat it as a global variable. Alternately, you could declare a pointer to an int and initialize it to that address. Both of these provide more type safety than a macro.
To expand on litb's answer, you can also use the --just-symbols={symbolfile} option to define several symbols, in case you have more than a couple of memory-mapped devices. The symbol file needs to be in the format
symbolname1 = address;
symbolname2 = address;
...
(The spaces around the equals sign seem to be required.)
Often, for embedded software, you can define within the linker file one area of RAM for linker-assigned variables, and a separate area for variables at absolute locations, which the linker won't touch.
Failing to do this should cause a linker error, as it should spot that it's trying to place a variable at a location already being used by a variable with absolute address.
This depends a bit on what OS you are using. I'm guessing you are using something like DOS or vxWorks. Generally the system will have certian areas of the memory space reserved for hardware, and compilers for that platform will always be smart enough to avoid those areas for their own allocations. Otherwise you'd be continually writing random garbage to disk or line printers when you meant to be accessing variables.
In case something else was confusing you, I should also point out that #define is a preprocessor directive. No code gets generated for that. It just tells the compiler to textually replace any foo_reg it sees in your source file with *(int *)0x100. It is no different than just typing *(int *)0x100 in yourself everywhere you had foo_reg, other than it may look cleaner.
What I'd probably do instead (in a modern C compiler) is:
// access register 'foo_reg', which is located at address 0x100
const int* foo_reg = (int *)0x100;
*foo_reg = 1; // write to foo_regint
x = *foo_reg; // read from foo_reg

Resources