What might be the point in putting a variable exactly in the "STACK" section with __attribute__ ((section("STACK"))? - c

In gcc doc one reason is given for using section. This reason is to map to special hardware. But this seems to be not my case.
So I have given a task to modify a shared library that we use on our project. It is a Linux library. There is variable declarations in the library that puzzeles me. They look like this (roughly):
static int my_var_1 __attribute__((section("STACK"))) = 0;
Update 1:
There are a dozen of variables defined in this way (__attribute__((section("STACK"))))
Update 2:
my_var_1 is not a constant. my_var_1 might be changed in code during initialization:
my_var_1 = atoi(getenv("MY_VAR_1") ? getenv("MY_VAR_1") : "0");
later in the library it is used like this:
inline void do_something() __attribute__((always_inline));
inline void do_something()
{
if (my_var_1)
do_something_else();
}
What might be the point in using __attribute__((section("STACK")))? I understand that section tells the compiler to put a variable in the particular section. However what might be the point in putting static int exactly in the "STACK" section?
Update 3
These lines are excerpt from the output from readelf -t my_lib.so
[23] .got.plt
PROGBITS 00000000002103f0 00000000000103f0 0
00000000000003a8 0000000000000008 0 8
[0000000000000003]: WRITE, ALLOC
[24] .data
PROGBITS 00000000002107a0 00000000000107a0 0
00000000000000b0 0000000000000000 0 16
[0000000000000003]: WRITE, ALLOC
[25] STACK
PROGBITS 0000000000210860 0000000000010860 0
00000000000860e0 0000000000000000 0 32
[0000000000000003]: WRITE, ALLOC
[26] .bss
NOBITS 0000000000296940 0000000000096940 0
0000000000000580 0000000000000000 0 32
[0000000000000003]: WRITE, ALLOC
Update 4
Managed to get information from the author of the shared library.
__attribute__((section("STACK"))) was added since he had not managed to build the library on Solaris. Then he found this workaround. Before the workaround the definition of the my_var_1 was like:
int my_var_1 = 0;
and everything was OK. Then he changed it since my_var_1 was in fact needed only in this translation unit:
static int my_var_1 = 0;
And after that change he did not manage to build the library on Solaris. So he added __attribute__((section("STACK"))) and it helped somehow.

First the STACK section won't be the stack of any running task.
Putting variables, functions in a specific Section allow to select a memory area for them (thanks to the linker script). On some (mostly embedded) architecture, you want put often accessed data in the faster memory.
Other solution, some development post-link script will set all the STACK section to 1: a development software will always do do_something_else(). And the released software may keep the default value of 0.
An other possibility, if there are other variables in the STACK section, the developer wants to keep them close in the memory. All Variable in the STACK section will be near each other. Maybe a cache optimization ?

There may be many reasons and it is difficult to tell without details. Some of the reasons might be:
The section marked STACK is linked in run-time to a closely coupled memory with faster access time then other RAMs. It makes sense to map the stack to such a RAM to avoid stalls during function calls. Now if you suddenly had a variable that is accessed a lot and you wanted to map it to the same fast access RAM putting it in the same section as the stack makes sense.
The section marked STACK might be mapped to a region of memory that is accessible when other parts of memory might not be. For example, boot loaders need to init the memory controller before they can access RAM. But you really want to be able to write the code that does that in C, which requires stack. So you find some special memory (such as programming the data cache to write-back mode) and map the stack there so you can run code to get the memory controller working so you can use RAM. Once again, if you now happen to have a global variable that still need to be accessed before RAM is available, you might decide to put it in the STACK section.
A better programmer would have renamed the STACK section to something else if it is used not only for stack.

In some operating systems, the same region of addressing space is used for every thread's stack; when execution switches between threads, the mapping of that space is changed accordingly. On such systems, every thread will have its own independent set of any static variables located within that region of address space. Putting variables that need to be maintained separately for each thread in such an address range will avoid the need to manually swap them with each task switch.
Another occasional use for forcing variables into a stack area is to add stack sentinels (variables that can be periodically checked to see if a stack overflow clobbered them).
A third use occurs on 8086 and 80286 platforms (probably not so much later chips in the family): the 8086 and 80286 are limited to efficiently accessing things in four segments without having to reload segment registers. If code needs to do something equivalent to
for (n=0; n<256; n++)
*dest++ = xlat[*src++];
and none of the items can be put in the code segment, being able to force one of the items into the stack segment can make code much faster. Hand-written assembly code would be required to achieve the speedup, but it can be extremely massive (nearly a factor of two in some real-world situations I've done on 8086, and perhaps even greater in some situations on the 80286).

Related

stack memory management in embedded systems

In a course I am taking about embedded systems there are certain statements which lack a deep explanation which has left me confused at some points. I would be grateful if someone can offer me clarifications.
I have been told that, if there are initialized variables, their initialization values are stored in the code segment (may be in flash) and are loaded (may be to RAM) by startup routines before running the program. This make sense to me considering global variables as they are allocated to .data section. I presume that global variables have a fixed address for the entire program and the initialization value is loaded to a specific address location(please correct me if I am wrong). Now, how is this done for local variables considering that they don't have a fixed address location on stack? Considering that local variables come to existence only during function execution, how do they get initialized each time the function is invoked?
Also, The instructor says, "The stack is reserved at compile time and the data is allocated at runtime by pre-compiled instructions". Can someone please make me understand the latter half of this statement?
Your understanding of static variables the the .data section is correct. You may also want to consider zero-initialized static variables in the .bss section. These are initialized at the same time as those in the .data section, but their initial value does not need to be stored because it is zero.
Automatic variables may be on the stack or may be optimized to only be in processor registers. Either way, code is generated by the compiler to initialize them each time the function using them is called. If they are on the stack then this will include an instruction to adjust the stack pointer to "allocate" space for them when they are needed and "free" them when they go out of context.
The space for the entire stack is usually allocated in the linker script. In an embedded microcontroller system no instructions are necessary to "allocate" it. Depending on the hardware there may be code required to enable access to external memory, but in most cases there is a bank of fast SRAM ready to use as soon as the system powers on, and the first stack will be in this.

Stack and heap confusion for embedded 8051

I am trying to understand a few basics concepts regarding the memory layout for a 8051 MCU architecture. I would be grateful if anyone could give me some clarifications.
So, for a 8051 MCU we have several types of memories:
IRAM - (idata) - used for general purpose registers and SFRs
PMEG - (code) - used to store code - (FLASH)
XDATA
on chip (data) - cache memory for data (RAM) /
off-chip (xdata) - external memory (RAM)
Questions:
So where is the stack actually located?
I would assume in IRAM (idata) but it's quite small (30-7Fh)- 79 bytes
What does the stack do?
Now, on one hand I read that it stores the return addresses when we call a function (e.g. when I call a function the return address is stored on the stack and the stack pointer is incremented).
http://www.alciro.org/alciro/microcontroladores-8051_24/subrutina-subprograma_357_en.htm
On the other hand I read that the stack stores our local variables from a function, variables which are "deleted" once we return from that function.
http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html
If I use dynamic memory allocation (heap), will that memory always be reserved in off-chip RAM (xdata), or it depends on compiler/optimization?
The 8051 has its origin in the 1970ies/early 80ies. As such, it has very limited ressources. The original version did (for instance) not even have XRAM, that was "patched" aside later and requires special (ans slow) accesses.
The IRAM is the "main memory". It really includes the stack (yes, there are only few bytes). The rest is used for global variables ("data" and "bss" section: initialized and uninitialized globals and statics). The XRAM might be used by a compiler for the same reason.
Note that with these small MCUs you do not use many local variables (and if, only 8bit types). A clever compiler/linker (I actually used some of these) can allocate local variables statically overlapping - unless there is recursion used (very unlikely).
Most notably, programs for such systems mostly do not use a heap (i.e. dynamic memory allocation), but only statically allocated memory. At most, they might use a memory pool, which provides blocks of fixed size and does not merged blocks.
Note that the IRAM includes some special registers which can be bit-addressed by the hardware. Normally, you would use a specialized compiler which can exploit these functions. Very likely some features require special assembler functions (these might be provided in a header as C-functions just generating the corresponding machine code instruction), called intrinsics.
The different memory areas might also require compiler-extensions to be used.
You might have a look at sdcc for a suitable compiler.
Note also that the 8051 has an extended harvard architecture (code and data seperated with XRAM as 3rd party).
Regarding your 2nd link: This is a very generalized article; it does not cover MCUs like the 8051 (or AVR, PIC and the like), but more generalized CPUs like x86, ARM, PowerPC, MIPS, MSP430 (which is also a smaller MCU), etc. using an external von Neumann architecture (internally most (if not all) 32+ bitters use a harvard architecture).
I don't have direct experience with your chips, but I have worked with very constrained systems in the past. So here is what I can answer:
Question 1 and 2: The stack is more than likely set within a very early startup routine. This will set a register to tell it where the stack should start. Typically, you want this in memory that is very fast to access because compiled code loves pushing and popping memory from the stack all the time. This includes return addresses in calls, local variable declarations, and the occasional call to directly allocate stack memory (alloca).
For your 3rd question, the heap is set wherever your startup routine set it to.
There is no particular area that a heap needs to live. If you want it to live in external memory, then it can be set there. You want it in your really small/fast area, you can do that too, though that is probably a very bad idea. Again, your chip's/compiler's manual or included code should show you an overloaded call to malloc(). From here, you should be able to walk backwards to see what addresses are being passed into its memory routines.
Your IRAM is so dang small that it feels more like Instruction RAM - RAM where you would put a subroutine or two to make running code from them more efficient. 80 bytes of stack space will evaporate very quickly in a typical C function call framework. Actually, for sizes like this, you might have to hand assemble stuff to get the most out of things, but that may be beyond your scope.
If you have other questions, let me know. This is the kind of stuff I like doing :)
Update
This page has a bunch of good information on stack management for your particular chip. It appears that the stack for this chip is indeed in IRAM and is very very constrained. It also appears that assembly level coding on this chip would be the norm as this amount of RAM is quite small indeed.
Heck, this is the first system I've seen in many years that has bank switching as a way to access more RAM. I haven't done that since the Color Gameboy's Z80 chip.
Concerning the heap:
There is also a malloc/free couple
You have to call init_mempool(), which is indicated in compiler documentation but it is somewhat uncommon.
The pseudo-code below to illustrate this.
However I used it only this way and did not try heavy used of malloc/free like you may find in dynamic linked list management, so I have no idea of the performance you get out of this.
//A "large" place in xdata to be used as heap
static char xdata heap_mem_pool [1000];
//A pointer located in data and pointing to something in xdata
//The size of the pointer is then 2 bytes instead of 3 ( the 3rd byte
//store the area specification data, idata, xdata )
//specifier not mandatory but welcome
char xdata * data shared_memory;
//...
u16 mem_size_needed;
init_mempool (heap_mem_pool, sizeof(heap_mem_pool));
//..
mem_size_needed = calcute_needed_memory();
shared_memory = malloc(mem_size_needed);
if ( 0 == shared_memory ) return -1;
//...use shared_memory pointer
//free if not needed anymore
free(shared_memory);
Some additionnal consequences about the fact that in general no function is reentrant ( or with some effort ) due to this stackless microcontroller.
I will call "my system" the systemI am working on at the present time: C8051F040 (Silab) with Keil C51 compiler ( I have no specific interest in these 2 companies )
The (function return address) stack is located low in the iram (idata on my system).
If it start at 30(dec) it means you have either global or local variables in your code that you requested to be in data RAM ( either because you choose a "small" memory model or because you use the keyword data in the variable declaration ).
Whenever you call a function the return 2 bytes address of the caller function will be save in this stack ( 16 bits code space ) and that's all: no registers saving, no arguments pushed onto the (non-existing)(data) stack. Your compiler may also limit the functions call depth.
Necessary arguments and local variables ( and certainly saved registers ) are placed somewhere in the RAM ( data RAM or XRAM )
So now imagine that you want to use the same innocent function ( like memcpy() ) both in your interrupt and in you normal infinite loop, it will cause sporadic bugs. Why ?
Due to the lack of stack, the compiler must share RAM memory places for arguments, local variables ... between several functions THAT DO NOT BELONG to the same call tree branch
The pitfall is that an interrupt is its own call tree.
So if an interrupt occurs while you were executing e.g the memcpy() in your "normal task", you may corrupt the execution of memcpy() because when going out of the interrupt execution, the pointers dedicated to the copy performed in the normal task will have the (end) value of the copy performed in the interrupt.
On my system I get a L15 linker error when the compiler detects that a function is called by more than one independant "branch"
You may make a function reentrant with the addition of the reentrant keyword and requiring the creation of an emulated stack on the top of the XRAM for example. I did not test on my system because I am already lacking of XRAM memory which is only 4kB.
See link
C51: USING NON-REENTRANT FUNCTION IN MAIN AND INTERRUPTS
In the standard 8051 uC, the stack occupies the same address space as register bank 1(08H to 0FH) by default at start up. That means, the stack pointer(SP register) will be having a value of 07H at startup(incremented to 08H when stack is PUSHed). This probably limits the stack memory to 8 bytes, if register bank 2(starting from 10H) is occuppied. If register banks 2 and 3 are not used, even that can be taken up by the stack(08H to 1FH).
If in a given program we need more than 24 bytes (08 to 1FH = 24 bytes) of stack, we can change the SP to point to RAM locations 30 – 7FH. This is done with the instruction “MOV SP, #xx”. This should clarify doubts surrounding the 8051 stack usage.

Location of variables in C

I'm trying to understand how C allocates memory to global variables.
I'm working on a simple Kernel. So far it can't do much more than print to screen and enable interrupts. I'm now working on a basic physical memory manager.
My memory manager is a bitmap that sets a 1 or 0 if memory is allocated or available. I need to add the memory that my Kernel is using to the bitmap as 'allocated', so nothing overwrites it.
I can easily find out the start of the Kernel, as it's statically loaded to 0x100000. Figuring out the length shouldn't be too difficult either. The part I'm not sure about is where global variables are put in memory?
Let's say my Kernel is 12K, I can then allocate these 3x 4K blocks of memory to it for protection. Do I need to allocate more to cover the variables it uses? Or are the variables part of that 12K?
Thank you for your help, I hope I am making enough sense.
have a look at
http://www.geeksforgeeks.org/archives/14268
your globals mostly are in the BSS
As the previous answer says, most variables are stored in the .bss section but they can also be stored in the .data or .rodata section depending on if you defined the global variables as static or const. After compiling you can use readelf -S kernel.bin to see exactly how much space each section will utilize. For the .bss section the memory is only occupied when the binary is loaded in memory and does not take any space on disk. This means that your compiled kernel binary will be smaller than the actual size it will later use when brought into memory (by grub usually).
A simple way to figure out exactly how much data your kernel will use besides using readelf is to place the .bss section inside the .data section within your linker script. The size of the kernel binary will then be the same size both on disk as in memory (or actually it will be a bit smaller in memory since not all sections are copied by grub) but then at least you know the minimum amount of memory you need to allocate.
I'd recommend using a custom linker script (assuming you use gcc): it makes the layout of kernel sections explicit and customizable (to read more about linker scripts, read info ld). You can see an example of my OS's linker script here.
To see the default linker script use -v/--verbose option of ld.
Mostly global variables are located in .data.* and .rodata.* sections, variables initialized with 0 go in .bss.

Tool to clearly visualize Memory Layout of a C Program

Suppose I am having this code:
int main() {
int var1;
char *ptr = malloc(5 * sizeof(char));
//...........
do_something();
//...........
return 0;
}
We know that the actual memory layout will be divided into segments like: .text, .bss, .data, .heap, .stack.
I know how to use objdump, readelf, etc. But, I want to get a better view of the memory stack, where I can see things like:
.heap ptr
.stack do_something()
.text main()
.bss var1
The main point is: The actual variable names are missing from the output of objdump, readelf etc.
I am compiling this code with -g, thus retaining the symbol table.
Then, why am I not able to see the memory layout with local/global variable names included?
objdump -x shows the names of the variables if type is static otherwise not. Why?
There are few methods to track memory allocation but none of them is a builtin method and all of them require some additional work on your side. In order to visualise memory you will have to use code instrumentation and/or event logging i.e. memory allocation and deallocation events and then replay all the events and generate graphs out of it.
Take a look at this paper:Visualizing Dynamic Memory Allocations (in C programs).
The GCSpy (for heap visualisation) is available here: https://www.cs.kent.ac.uk/projects/gc/gcspy/. While initially used for JVM, you can visualise the heap of a C program using for instance dlmalloc.
I completely understand why you would like to do that - I was looking for the same thing. While I don't find memory layout snapshotting very useful per se, I find observing how memory is being allocated over time very interesting and useful for debugging performance issues.
I remember that XCode had some instrumentation tools built in - never used them though, but perhaps worth exploring what they are offering.
Sorry to say you're a bit confused about this. Consider:
all your functions go in the .text section
all your non-static local variables on on the stack: that they may be pointers and you intend to assign them a value returned from malloc doesn't put them on the heap, it just attempts to create a pointed-to object on the heap. No static tool looking at the binary (such as objdump, readelf) can know whether malloc will return memory or fail.
your global and static variables are likely to end up in an initialised or uninitialised data segment - which depends on whether the initial bit pattern is entirely 0s, and whether the compiler can convince itself of that at compile time.
Further, if you understand the above, then you don't need anything to draw you a little chart on a variable by variable basis, you just know instantly what type of memory you're using.

Fixed address variable in C

For embedded applications, it is often necessary to access fixed memory locations for peripheral registers. The standard way I have found to do this is something like the following:
// access register 'foo_reg', which is located at address 0x100
#define foo_reg *(int *)0x100
foo_reg = 1; // write to foo_reg
int x = foo_reg; // read from foo_reg
I understand how that works, but what I don't understand is how the space for foo_reg is allocated (i.e. what keeps the linker from putting another variable at 0x100?). Can the space be reserved at the C level, or does there have to be a linker option that specifies that nothing should be located at 0x100. I'm using the GNU tools (gcc, ld, etc.), so am mostly interested in the specifics of that toolset at the moment.
Some additional information about my architecture to clarify the question:
My processor interfaces to an FPGA via a set of registers mapped into the regular data space (where variables live) of the processor. So I need to point to those registers and block off the associated address space. In the past, I have used a compiler that had an extension for locating variables from C code. I would group the registers into a struct, then place the struct at the appropriate location:
typedef struct
{
BYTE reg1;
BYTE reg2;
...
} Registers;
Registers regs _at_ 0x100;
regs.reg1 = 0;
Actually creating a 'Registers' struct reserves the space in the compiler/linker's eyes.
Now, using the GNU tools, I obviously don't have the at extension. Using the pointer method:
#define reg1 *(BYTE*)0x100;
#define reg2 *(BYTE*)0x101;
reg1 = 0
// or
#define regs *(Registers*)0x100
regs->reg1 = 0;
This is a simple application with no OS and no advanced memory management. Essentially:
void main()
{
while(1){
do_stuff();
}
}
Your linker and compiler don't know about that (without you telling it anything, of course). It's up to the designer of the ABI of your platform to specify they don't allocate objects at those addresses.
So, there is sometimes (the platform i worked on had that) a range in the virtual address space that is mapped directly to physical addresses and another range that can be used by user space processes to grow the stack or to allocate heap memory.
You can use the defsym option with GNU ld to allocate some symbol at a fixed address:
--defsym symbol=expression
Or if the expression is more complicated than simple arithmetic, use a custom linker script. That is the place where you can define regions of memory and tell the linker what regions should be given to what sections/objects. See here for an explanation. Though that is usually exactly the job of the writer of the tool-chain you use. They take the spec of the ABI and then write linker scripts and assembler/compiler back-ends that fulfill the requirements of your platform.
Incidentally, GCC has an attribute section that you can use to place your struct into a specific section. You could then tell the linker to place that section into the region where your registers live.
Registers regs __attribute__((section("REGS")));
A linker would typically use a linker script to determine where variables would be allocated. This is called the "data" section and of course should point to a RAM location. Therefore it is impossible for a variable to be allocated at an address not in RAM.
You can read more about linker scripts in GCC here.
Your linker handles the placement of data and variables. It knows about your target system through a linker script. The linker script defines regions in a memory layout such as .text (for constant data and code) and .bss (for your global variables and the heap), and also creates a correlation between a virtual and physical address (if one is needed). It is the job of the linker script's maintainer to make sure that the sections usable by the linker do not override your IO addresses.
When the embedded operating system loads the application into memory, it will load it in usually at some specified location, lets say 0x5000. All the local memory you are using will be relative to that address, that is, int x will be somewhere like 0x5000+code size+4... assuming this is a global variable. If it is a local variable, its located on the stack. When you reference 0x100, you are referencing system memory space, the same space the operating system is responsible for managing, and probably a very specific place that it monitors.
The linker won't place code at specific memory locations, it works in 'relative to where my program code is' memory space.
This breaks down a little bit when you get into virtual memory, but for embedded systems, this tends to hold true.
Cheers!
Getting the GCC toolchain to give you an image suitable for use directly on the hardware without an OS to load it is possible, but involves a couple of steps that aren't normally needed for normal programs.
You will almost certainly need to customize the C run time startup module. This is an assembly module (often named something like crt0.s) that is responsible initializing the initialized data, clearing the BSS, calling constructors for global objects if C++ modules with global objects are included, etc. Typical customizations include the need to setup your hardware to actually address the RAM (possibly including setting up the DRAM controller as well) so that there is a place to put data and stack. Some CPUs need to have these things done in a specific sequence: e.g. The ColdFire MCF5307 has one chip select that responds to every address after boot which eventually must be configured to cover just the area of the memory map planned for the attached chip.
Your hardware team (or you with another hat on, possibly) should have a memory map documenting what is at various addresses. ROM at 0x00000000, RAM at 0x10000000, device registers at 0xD0000000, etc. In some processors, the hardware team might only have connected a chip select from the CPU to a device, and leave it up to you to decide what address triggers that select pin.
GNU ld supports a very flexible linker script language that allows the various sections of the executable image to be placed in specific address spaces. For normal programming, you never see the linker script since a stock one is supplied by gcc that is tuned to your OS's assumptions for a normal application.
The output of the linker is in a relocatable format that is intended to be loaded into RAM by an OS. It probably has relocation fixups that need to be completed, and may even dynamically load some libraries. In a ROM system, dynamic loading is (usually) not supported, so you won't be doing that. But you still need a raw binary image (often in a HEX format suitable for a PROM programmer of some form), so you will need to use the objcopy utility from binutil to transform the linker output to a suitable format.
So, to answer the actual question you asked...
You use a linker script to specify the target addresses of each section of your program's image. In that script, you have several options for dealing with device registers, but all of them involve putting the text, data, bss stack, and heap segments in address ranges that avoid the hardware registers. There are also mechanisms available that can make sure that ld throws an error if you overfill your ROM or RAM, and you should use those as well.
Actually getting the device addresses into your C code can be done with #define as in your example, or by declaring a symbol directly in the linker script that is resolved to the base address of the registers, with a matching extern declaration in a C header file.
Although it is possible to use GCC's section attribute to define an instance of an uninitialized struct as being located in a specific section (such as FPGA_REGS), I have found that not to work well in real systems. It can create maintenance issues, and it becomes an expensive way to describe the full register map of the on-chip devices. If you use that technique, the linker script would then be responsible for mapping FPGA_REGS to its correct address.
In any case, you are going to need to get a good understanding of object file concepts such as "sections" (specifically the text, data, and bss sections at minimum), and may need to chase down details that bridge the gap between hardware and software such as the interrupt vector table, interrupt priorities, supervisor vs. user modes (or rings 0 to 3 on x86 variants) and the like.
Typically these addresses are beyond the reach of your process. So, your linker wouldn't dare put stuff there.
If the memory location has a special meaning on your architecture, the compiler should know that and not put any variables there. That would be similar to the IO mapped space on most architectures. It has no knowledge that you're using it to store values, it just knows that normal variables shouldn't go there. Many embedded compilers support language extensions that allow you to declare variables and functions at specific locations, usually using #pragma. Also, generally the way I've seen people implement the sort of memory mapping you're trying to do is to declare an int at the desired memory location, then just treat it as a global variable. Alternately, you could declare a pointer to an int and initialize it to that address. Both of these provide more type safety than a macro.
To expand on litb's answer, you can also use the --just-symbols={symbolfile} option to define several symbols, in case you have more than a couple of memory-mapped devices. The symbol file needs to be in the format
symbolname1 = address;
symbolname2 = address;
...
(The spaces around the equals sign seem to be required.)
Often, for embedded software, you can define within the linker file one area of RAM for linker-assigned variables, and a separate area for variables at absolute locations, which the linker won't touch.
Failing to do this should cause a linker error, as it should spot that it's trying to place a variable at a location already being used by a variable with absolute address.
This depends a bit on what OS you are using. I'm guessing you are using something like DOS or vxWorks. Generally the system will have certian areas of the memory space reserved for hardware, and compilers for that platform will always be smart enough to avoid those areas for their own allocations. Otherwise you'd be continually writing random garbage to disk or line printers when you meant to be accessing variables.
In case something else was confusing you, I should also point out that #define is a preprocessor directive. No code gets generated for that. It just tells the compiler to textually replace any foo_reg it sees in your source file with *(int *)0x100. It is no different than just typing *(int *)0x100 in yourself everywhere you had foo_reg, other than it may look cleaner.
What I'd probably do instead (in a modern C compiler) is:
// access register 'foo_reg', which is located at address 0x100
const int* foo_reg = (int *)0x100;
*foo_reg = 1; // write to foo_regint
x = *foo_reg; // read from foo_reg

Resources