How to store a variable at a specific memory location? - c

As i am relatively new to C , i have to use for one of my projects the following:
i must declare some global variables which have to be stored every time the program runs at the same memory address.
I did some read and i found that is i declare it "static" it will be stored at the same memory location.
But my question is: can i indicate the program where to store that variable or not.
For example : int a to be stored at 0xff520000. Can this thing be done or not? i have searched here but did not found any relevant example. If their is some old post regarding this, please be so kind to share the link .
Thank you all in advance.
Laurentiu
Update: I am using a 32uC

In your IDE there will be a memory map available through some linker file. It will contain all addresses in the program. Read the MCU manual to see at which addresses there is valid memory for your purpose, then reserve some of that memory for your variable. You have to read the documentation of your specific development platform.
Next, please note that it doesn't make much sense to map variables at specific addresses unless they are either hardware registers or non-volatile variables residing in flash or EEPROM.
If the contents of such a memory location will change during execution, because it is a register, or because your program contains a bootloader/NVM programming algorithm changing NVM memory cells, then the variables must be declared as volatile. Otherwise the compiler will break your code completely upon optimization.
The particular compiler most likely has a non-standard way to allocate variables at specific addresses, such as a #pragma or sometimes the weird, non-standard # operator. The only sensible way you can allocate a variable at a fixed location in standard C, is this:
#define MY_REGISTER (*(volatile uint8_t*)0x12345678u)
where 0x12345678 is the address where 1 byte of that is located. Once you have a macro declaration like this, you can use it as if it was a variable:
void func (void)
{
MY_REGISTER = 1; // write
int var = MY_REGISTER; // read
}
Most often you want these kind of variables to reside in the global namespace, hence the macro. But if you for some reason want the scope of the variable to be reduced, then skip the macro and access the address manually inside the code:
void func (void)
{
*(volatile uint8_t*)0x12345678u = 1; // write
int var = *(volatile uint8_t*)0x12345678u; // read
}

You can do this kind of thing with linker scripts, which is quite common in embedded programming.
On a Linux system you might never get the same virtual address due to address space randomization (a security feature to avoid exploits that would rely on knowing the exact location of a variable like you describe).
If it's just a repeatable pointer you want, you may be able to map a specific address with mmap, but that's not guaranteed.

Like was mentioned in other answers - you can't.
But, you can have a workaround. If it's ok for the globals to be initialized in the main(), you can do something of this kind:
int addr = 0xff520000;
int main()
{
*((int*)addr) = 42;
...
return 0;
}
Note, however, that this is very dependent on your system and if running in protected environment, you'll most likely get a runtime crash. If you're in embedded/non-protected environment, this can work.

No you cannot tell it explicitly where to store a variable in memory. Mostly because on modern systems you have many things done by the system in regards to memory, that is out of your control. Address Layout Randomization is one thing that comes to mind that would make this very hard.

according your compiler if you use XC8 Compiler.
Simply you can write int x # 0x12 ;
in this line you set x in the memory location 0x12

Not at the C level. If you work with assembly language, you can directly control the memory layout. But the C compiler does this for you. You can't really mess with it.
Even with assembly, this only controls the relative layout. Virtual memory may place this at any (in)convenient physical location.

You can do this with some compiler extensions, but it's probably not what you want to do. The operating system handles your memory and will put things where it wants. How do you even know that the memory address you want will be mapped in your program? Ignore everything in this paragraph if you're on an embedded platform, then you should read the manual for that platform/compiler or at least mention it here so that people can give a more specific answer.
Also, static variables don't necessarily have the same address when the program runs. Many operating systems use position independent executables and randomize the address space on every execution.

You can declare a pointer to a specific memory address, and use the contents of that pointer as a variable I suppose:
int* myIntPointer = 0xff520000;

Related

How to get the beginning address and ending address of a stack in C

For some reason, I want to get the address ranges of a stack. For example, consider the following example:
int main(){
int a = 0;
int b = 0;
}
Is there any generic way I can know the address of a and b (and another other variable on stack), without explicitly use &a in code?
Thanks!
Memory address in general, and stacks in particular, are system specific. There exists no way to obtain such information in standard C, nor is there a way to set the stack pointer in C.
In fact if you don't use the & operator, the variables are quite likely to get allocated in registers instead of the stack.
For the rare case where you actually need to know the stack address, for example when dealing with low level embedded systems, you'd typically go check a linker script and hardcode the value, or use some specific non-standard compiler extension.

How do you know the exact address of a variable?

So I'm looking through my C programming text book and I see this code.
#include <stdio.h>
int j, k;
int *ptr;
int main(void)
{
j = 1;
k = 2;
ptr = &k;
printf("\n");
printf("j has the value %d and is stored at %p\n", j, (void *)&j);
printf("k has the value %d and is stored at %p\n", k, (void *)&k);
printf("ptr has the value %p and is stored at %p\n", (void *)ptr, (void *)&ptr);
printf("The value of the integer pointed to by ptr is %d\n", *ptr);
return 0;
}
I ran it and the output was:
j has the value 1 and is stored at 0x4030e0
k has the value 2 and is stored at 0x403100
ptr has the value 0x403100 and is stored at 0x4030f0
The value of the integer pointed to by ptr is 2
My question is if I had not ran this through a compiler, how would you know the address to those variables by just looking at this code? I'm just not sure how to get the actual address of a variable. Thanks!
Here's my understanding of it:
The absolute addresses of things in memory in C is unspecified. It's not standardised into the language. Because of this, you can't know the locations of things in memory by looking at just the code. (However, if you use the same compiler, code, compiler options, runtime and operating system, the addresses may be consistent.)
When you're developing applications, this is not behaviour you should rely on. You may rely on the difference between the locations of two things in some contexts, however. For example, you can determine the difference between the addresses of pointers to two array elements to determine how many elements apart they are.
By the way, if you are considering using the memory locations of variables to solve a particular problem, you may find it helpful to post a separate question asking how to so without relying on this behaviour.
There is no other way to "know the exact address" of a variable in Standard C than to print it with "%p". The actual address is determined by many factors not under control of the programmer writing code. It's a matter of OS, the linker, the compiler, options used and probably others.
That said, in the embedded systems world, there are ways to express this variable must reside at this address, for example if registers of external devices are mapped into the address space of a running program. This usually happens in what is called a linker file or map file or by assigning an integral value to a pointer (with a cast). All of these methods are non-standard.
For the purpose of your everyday garden-variety programs though, the point of writing C programs is that you need and should not care where your variables are stored.
You can't.
Different compilers can put the variables in different places. On some machines the address is not a simple integer anyway.
The compiler only knows things like "the third integer global variable" and "the four bytes allocated 36 bytes down from the stack pointer." It refers to global vars, pointers to subroutines (functions), subroutine arguments and local vars only in relative terms. (Never mind the extra stuff for polymorphic objects in C++, yikes!) These relative references are saved in the object file (.o or .obj) as special codes and offset values.
The Linker can fill in some details. It may modify some of these sketchy location references when joining several object files. Global variable locations will share a space (the Data Section) when globals from multiple compilation units are merged; the linker decides what order they all go in, but still describing them as relative to the start of the entire set of global vars. The result is an executable file with the final opcodes, but addresses still being sketchy and based on relative offsets.
It's not until the executable is loaded that the Loader replaces all the relative addresses with actual addresses. This is possible now, because the loader (or some part of the operating system it depends on) decides where in the whole virtual address space of the process to store the program's opcodes (Text Section), global variables (BSS, Data Sections) and call stack, and other things. The loader can do the math, and write the actual address into every spot in the executable, typically as part of "load immediate" opcodes and all opcodes involving memory access.
Google "relocation table" for more. See http://www.iecc.com/linker/linker07.html (somewhat old) for a more detailed explanation for particular platforms.
In real life, it's all complicated by the fact that virtual addresses are mapped to physical addresses by a virtual memory system, using segments or some other mechanism to keep each process in a separate address space.
I would like to further build upon the answers already provided by pointing out that some compilers, such as Visual Studio's, have a feature called Address Space Layout Randomization (ASLR), which makes programs begin at a random memory address as an anti-virus feature. Given the addresses that you have in your output, I'd say that you compiled without it (programs without it start at address 0x400000, I think). My source for this information is an answer to this question.
That said, the compiler is what determines the memory addresses at which local variables will be stored. The addresses will most likely change from compiler to compiler, and probably also with each version of the source code.
Every process has its own logical address space starting from zero. Addressees your program can access are all relative to zero. Absolute address of any memory location is decided only after loading the process in main memory. This is done using dynamic relocation by modern operating systems. Hence every time a process is loaded into memory it may be loaded at different location according to availability of the memory. Hence allowing user processes to know exact address of data stored in memory does not make any sense. What your code is printing, is a logical address and not the exact or physical address.
Continuing on the answers described above, please do not forget that processes would run in their own virtual address space (process isolation). This ensures that when your program corrupts some memory, the other running processes are not affected.
Process Isolation:
http://en.wikipedia.org/wiki/Process_isolation
Inter-Process Communication
http://en.wikipedia.org/wiki/Inter-process_communication

Declare a pointer to an integer at address 0x200 in memory

I have a couple of doubts, I remember some where that it is not possible for me to manually put a variable in a particular location in memory, but then I came across this code
#include<stdio.h>
void main()
{
int *x;
x=0x200;
printf("Number is %lu",x); // Checkpoint1
scanf("%d",x);
printf("%d",*x);
}
Is it that we can not put it in a particular location, or we should not put it in a particular location since we will not know if it's a valid location or not?
Also, in this code, till the first checkopoint, I get output to be 512.
And then after that Seg Fault.
Can someone explain why? Is 0x200 not a valid memory location?
In the general case - the behavior you will get is undefined - everything can happen.
In linux for example, the first 1GB is reserved for kernel, so if you try to access it - you will get a seg fault because you are trying to access a kernel memory in user mode.
No idea how it works in windows.
Reference for linux claim:
Currently the 32 bit x86 architecture is the most popular type of
computer. In this architecture, traditionally the Linux kernel has
split the 4GB of virtual memory address space into 3GB for user
programs and 1GB for the kernel.
Adding to what #amit wrote:
In windows it is the same. In general it is the same for all protected-mode operating systems. Since DOS etc. are no longer around it is the same with all systems except kernel-mode (km-drivers) and embedded systems.
The operating system manages which memory-pages you are allowed to write to and places markers that will make the cpu automatically raise access-violations if some other page is written to.
Up until the "checkpoint", you haven't accessed memory location 0x200, so everything works fine.
There I'd a local variable x in the function main. It is of type "pointer to int". x is assigned the value 0x200, and then that value is printed. But the target of x hasn't been accessed, so up to this point it doesn't matter whether x holds a valid memory address or not.
Then scanf tries to write to the memory address you passed in, which is the 0x200 stored in x. Then you get a seg fault, which is certainly sac possible result of trying to write to an arbitrary memory address.
So what are your doubts? What makes you think that this might work, when you come across this code that clearly doesn't?
Writing to a particular memory address might work under certain conditions, but is extremely unlikely to in general. Under all modern OSes, normal programs do not have control over their memory layout. The OS decides where initial things like the program's code, stack, and globals go. The OS will probably also be using some memory space, and it is not required to tell you what it's using. Instead you ask for memory (either by making variables or by calling memory allocation routines), and you use that.
So writing to particular addresses is very very likely to get either memory that hasn't been allocated, or memory that is being used for some other purpose. Neither of those is good, even if you do manage to hit an address that is actually writable. What if you clobber sundry some piece of data used by one of your program's other variables? Or some other part of your program clobbers the value you just wrote?
You should never be choosing a particular hard-coded memory address, you should be using an address of something you know is a variable, or an address you got from something like malloc.

const value vs. #define, which kind of chip resource will be used?

if I define a macro, or use static const value, in an embedded system,
which kind of memory will be used, chip flash or chip ram?
Which way is better?
I believe the answer is more complex.
Edit: I apologise for using 'should' and 'might', but without a specific compiler, or debugger, I find it had to be accurate and precise. Maybe if the question can say what compiler and platform is targeted, we can be clearer?
#define NAME ((type_cast)value) consumes no space until it appears in the code. The compiler may be able to deduce something using its value (compared to using a variable with an unknown run-time value), and hence might change the code generated so that it effectively consumes no space, or may even reduce the size of code. If the compiler's analysis is that the literal value will be needed at run time, then it consume code space. The literal value is known, so the compiler should be able to allocate the optimum amount of space. Depending on the processor, it should be stored in flash, but might not be in-line code, but instead in a 'literal pool', a set of local variables, typically near to the code so compact addresses mght be used. The compiler will likely make good decisions.
static const type name = value; should not consume space until it is used in the code. Even when it is used in code, it might or might not consume 'space' depending on your compiler (and, I think, the C standard it is compiling) and how the code uses the value.
If the address of the name is never taken, then the compiler does not have to store it. If the address of the value is taken (and that code is not eliminated) then the value must be in memory. Smart compilers will detect whether or not any code in the source file takes its address. Even though it might be stored, the compiler might generate better (faster or more compact code) by not using the stored value.
The compiler might do as good a job as #define NAME though it might be worse, than #define.
If the value had its address taken, then the compiler treats the variable as an initialised variable, which consumes space to store the constant value. The compiler doesn't really put values into RAM or flash. That depends on the linker. In gcc, there are 'attributes' which can be used to tell the linker which segment to put a variable into. By default the compiler puts initialised variables into the default data segment, and initialised const into a read-only segment. By using attributes, the programmer can put variables into any segment. Using an appropriate linker script, which usually comes with the toolchain, segments can be put in flash. Gcc uses the readonly data segment for data like literal strings.
name should be available in a debugger, but the #define NAME will not.
There is a third approach, which is to use enum's:
enum CONSTANTS { name = 1234, height = 456 ... };
These may be treated by the compiler like #define constants though they are not quite as flexible because they are int size (IIRC). There is no way to take the address of an enum value, so the compiler has as many options to generate good code ad a #define NAME. They will often be available in the debugger.
const type name = value; may consume RAM. It must be in memory because the compiler can't know if a code in a different file uses it, or takes its address (but gcc LTO might change that) The const tells the compiler to 'warn' (or 'error) where any code tries to change the value, e,g, using an assignment operator. Normally variables held in RAM are stored in the data or bss memory segments. By default gcc puts const into a readonly segment, the segment can set using the command line option -mrodata=readonly-data-section. that segment is .rodata on ARM.
On embedded systems, all initialised global and static variables (const or not) are also held in flash, and copied to RAM when the program starts (before main() is called). All uninitialised global or static variables are set to 0 before main() is called.
The compiler might put const variables into their own memory segment (gcc does), which may allow a linker (e.g. ld) script to put them into flash, and not allocate any RAM to them (this wouldn't work on e.g. AVR ATmega which use different imstructions to load data from flash).
Well, if you #define a macro, no additional memory or code space (flash) allocated for it. All job done in compile stage.
If you use a static const global variable, binary codes will generated for initial value and memory allocated for it. both flash (bin file size bigger) and memory (chip ram) used.
In addition to what other said:
using #define tells nothing about the variable. Define itself does not need any, but if you do something like int x = MY_DEFINE it will of course use one, and it will be non const.
On some toolchains/systems you can actually get const variables into some special section, you can put into FLASH/ROM, typically using custom linker script/compiler switches.

Fixed address variable in C

For embedded applications, it is often necessary to access fixed memory locations for peripheral registers. The standard way I have found to do this is something like the following:
// access register 'foo_reg', which is located at address 0x100
#define foo_reg *(int *)0x100
foo_reg = 1; // write to foo_reg
int x = foo_reg; // read from foo_reg
I understand how that works, but what I don't understand is how the space for foo_reg is allocated (i.e. what keeps the linker from putting another variable at 0x100?). Can the space be reserved at the C level, or does there have to be a linker option that specifies that nothing should be located at 0x100. I'm using the GNU tools (gcc, ld, etc.), so am mostly interested in the specifics of that toolset at the moment.
Some additional information about my architecture to clarify the question:
My processor interfaces to an FPGA via a set of registers mapped into the regular data space (where variables live) of the processor. So I need to point to those registers and block off the associated address space. In the past, I have used a compiler that had an extension for locating variables from C code. I would group the registers into a struct, then place the struct at the appropriate location:
typedef struct
{
BYTE reg1;
BYTE reg2;
...
} Registers;
Registers regs _at_ 0x100;
regs.reg1 = 0;
Actually creating a 'Registers' struct reserves the space in the compiler/linker's eyes.
Now, using the GNU tools, I obviously don't have the at extension. Using the pointer method:
#define reg1 *(BYTE*)0x100;
#define reg2 *(BYTE*)0x101;
reg1 = 0
// or
#define regs *(Registers*)0x100
regs->reg1 = 0;
This is a simple application with no OS and no advanced memory management. Essentially:
void main()
{
while(1){
do_stuff();
}
}
Your linker and compiler don't know about that (without you telling it anything, of course). It's up to the designer of the ABI of your platform to specify they don't allocate objects at those addresses.
So, there is sometimes (the platform i worked on had that) a range in the virtual address space that is mapped directly to physical addresses and another range that can be used by user space processes to grow the stack or to allocate heap memory.
You can use the defsym option with GNU ld to allocate some symbol at a fixed address:
--defsym symbol=expression
Or if the expression is more complicated than simple arithmetic, use a custom linker script. That is the place where you can define regions of memory and tell the linker what regions should be given to what sections/objects. See here for an explanation. Though that is usually exactly the job of the writer of the tool-chain you use. They take the spec of the ABI and then write linker scripts and assembler/compiler back-ends that fulfill the requirements of your platform.
Incidentally, GCC has an attribute section that you can use to place your struct into a specific section. You could then tell the linker to place that section into the region where your registers live.
Registers regs __attribute__((section("REGS")));
A linker would typically use a linker script to determine where variables would be allocated. This is called the "data" section and of course should point to a RAM location. Therefore it is impossible for a variable to be allocated at an address not in RAM.
You can read more about linker scripts in GCC here.
Your linker handles the placement of data and variables. It knows about your target system through a linker script. The linker script defines regions in a memory layout such as .text (for constant data and code) and .bss (for your global variables and the heap), and also creates a correlation between a virtual and physical address (if one is needed). It is the job of the linker script's maintainer to make sure that the sections usable by the linker do not override your IO addresses.
When the embedded operating system loads the application into memory, it will load it in usually at some specified location, lets say 0x5000. All the local memory you are using will be relative to that address, that is, int x will be somewhere like 0x5000+code size+4... assuming this is a global variable. If it is a local variable, its located on the stack. When you reference 0x100, you are referencing system memory space, the same space the operating system is responsible for managing, and probably a very specific place that it monitors.
The linker won't place code at specific memory locations, it works in 'relative to where my program code is' memory space.
This breaks down a little bit when you get into virtual memory, but for embedded systems, this tends to hold true.
Cheers!
Getting the GCC toolchain to give you an image suitable for use directly on the hardware without an OS to load it is possible, but involves a couple of steps that aren't normally needed for normal programs.
You will almost certainly need to customize the C run time startup module. This is an assembly module (often named something like crt0.s) that is responsible initializing the initialized data, clearing the BSS, calling constructors for global objects if C++ modules with global objects are included, etc. Typical customizations include the need to setup your hardware to actually address the RAM (possibly including setting up the DRAM controller as well) so that there is a place to put data and stack. Some CPUs need to have these things done in a specific sequence: e.g. The ColdFire MCF5307 has one chip select that responds to every address after boot which eventually must be configured to cover just the area of the memory map planned for the attached chip.
Your hardware team (or you with another hat on, possibly) should have a memory map documenting what is at various addresses. ROM at 0x00000000, RAM at 0x10000000, device registers at 0xD0000000, etc. In some processors, the hardware team might only have connected a chip select from the CPU to a device, and leave it up to you to decide what address triggers that select pin.
GNU ld supports a very flexible linker script language that allows the various sections of the executable image to be placed in specific address spaces. For normal programming, you never see the linker script since a stock one is supplied by gcc that is tuned to your OS's assumptions for a normal application.
The output of the linker is in a relocatable format that is intended to be loaded into RAM by an OS. It probably has relocation fixups that need to be completed, and may even dynamically load some libraries. In a ROM system, dynamic loading is (usually) not supported, so you won't be doing that. But you still need a raw binary image (often in a HEX format suitable for a PROM programmer of some form), so you will need to use the objcopy utility from binutil to transform the linker output to a suitable format.
So, to answer the actual question you asked...
You use a linker script to specify the target addresses of each section of your program's image. In that script, you have several options for dealing with device registers, but all of them involve putting the text, data, bss stack, and heap segments in address ranges that avoid the hardware registers. There are also mechanisms available that can make sure that ld throws an error if you overfill your ROM or RAM, and you should use those as well.
Actually getting the device addresses into your C code can be done with #define as in your example, or by declaring a symbol directly in the linker script that is resolved to the base address of the registers, with a matching extern declaration in a C header file.
Although it is possible to use GCC's section attribute to define an instance of an uninitialized struct as being located in a specific section (such as FPGA_REGS), I have found that not to work well in real systems. It can create maintenance issues, and it becomes an expensive way to describe the full register map of the on-chip devices. If you use that technique, the linker script would then be responsible for mapping FPGA_REGS to its correct address.
In any case, you are going to need to get a good understanding of object file concepts such as "sections" (specifically the text, data, and bss sections at minimum), and may need to chase down details that bridge the gap between hardware and software such as the interrupt vector table, interrupt priorities, supervisor vs. user modes (or rings 0 to 3 on x86 variants) and the like.
Typically these addresses are beyond the reach of your process. So, your linker wouldn't dare put stuff there.
If the memory location has a special meaning on your architecture, the compiler should know that and not put any variables there. That would be similar to the IO mapped space on most architectures. It has no knowledge that you're using it to store values, it just knows that normal variables shouldn't go there. Many embedded compilers support language extensions that allow you to declare variables and functions at specific locations, usually using #pragma. Also, generally the way I've seen people implement the sort of memory mapping you're trying to do is to declare an int at the desired memory location, then just treat it as a global variable. Alternately, you could declare a pointer to an int and initialize it to that address. Both of these provide more type safety than a macro.
To expand on litb's answer, you can also use the --just-symbols={symbolfile} option to define several symbols, in case you have more than a couple of memory-mapped devices. The symbol file needs to be in the format
symbolname1 = address;
symbolname2 = address;
...
(The spaces around the equals sign seem to be required.)
Often, for embedded software, you can define within the linker file one area of RAM for linker-assigned variables, and a separate area for variables at absolute locations, which the linker won't touch.
Failing to do this should cause a linker error, as it should spot that it's trying to place a variable at a location already being used by a variable with absolute address.
This depends a bit on what OS you are using. I'm guessing you are using something like DOS or vxWorks. Generally the system will have certian areas of the memory space reserved for hardware, and compilers for that platform will always be smart enough to avoid those areas for their own allocations. Otherwise you'd be continually writing random garbage to disk or line printers when you meant to be accessing variables.
In case something else was confusing you, I should also point out that #define is a preprocessor directive. No code gets generated for that. It just tells the compiler to textually replace any foo_reg it sees in your source file with *(int *)0x100. It is no different than just typing *(int *)0x100 in yourself everywhere you had foo_reg, other than it may look cleaner.
What I'd probably do instead (in a modern C compiler) is:
// access register 'foo_reg', which is located at address 0x100
const int* foo_reg = (int *)0x100;
*foo_reg = 1; // write to foo_regint
x = *foo_reg; // read from foo_reg

Resources