How are const char and char pointers represented in memory in STM32 - c

How does MCU now that the string a variable is pointing on is in data memory or in program memory?
What does compiler do when I'm casting a const char * to char * (e.g. when calling strlen function)?
Can char * be used as a char * and const char * without any performance loss?

The STM32s use a flat 32-bit address space, so RAM and program memory (flash) are in the same logical space.
The Cortex core of course knows which type of memory is where, probably through hardware address decoders that are triggered by the address being accessed. This is of course way outside the scope of what C cares about, though.
Dropping const is not a run-time operation, so there should be no performance overhead. Of course dropping const is bad, since somewhere you risk someone actually believing that a const pointer means data there won't be written to, and going back on that promise can make badness happen.

By taking a STM32F4 example with 1MB flash/ROM memory and 192KB RAM memory (128KB SDRAM + 64KB CCM) - the memory map looks something as follows:
Flash/ROM - 0x08000000 to 0x080FFFFF (1MB)
RAM - 0x20000000 to 0x2001FFFF (128KB)
There's more areas with separate address spaces that I won't cover here for the simplicity of the explanation. Such memories include Backup SRAM and CCM RAM, just to name two. In addition, each area may be further divided sections, such as RAM being divided to bss, stack and heap.
Now onto your question about strings and their locations - constant strings, such as:
const char *str = "This is a string in ROM";
are placed in flash memory. During compilation, the compiler places a temporary symbol that references such string. Later during linking phase, the linker (which knows about concrete values for each memory section) lays down all of your data (program, constant data etc.) in each section one after another and - once it knows concrete values of each such object - replaces those symbols placed by the compiler with concrete values which then appear in your binary. Because of this, later on during runtime when the assignment above is done, your str variable is simply assigned a constant value deduced by the linker (such as 0x08001234) which points directly to the first byte of the string.
When it comes to dynamically allocated values - whenever you call malloc or new a similar task is done. Assuming sufficient memory is available, you are given the address to the requested chunk of memory in RAM and those calculations are during runtime.
As for the question regarding const qualifier - there is not meaning to it once the code is executed. For example, during runtime the strlen function will simply go over memory byte-by-byte starting at the passed location and ending once binary 0 is encountered. It doesn't matter what "type" of bytes are being analyzed, because this information is lost once your code is converted to byte code. Regarding const in your context - const qualifier appearing in function parameter denotes that such function will not modify the contents of the string. If it attempted to, a compilation error would be raised, unless it implicitly performs a cast to a non-const type. You may, of course, pass a non-const variable as a const parameter of a function. The other way however - that is passing a const parameter to a non-const function - will raise an error, as this function may potentially modify the contents of the memory you point to, which you implicitly specified to be non-modifiable by making it const.
So to summarize and answer your question: you can do casts as much as you want and this will not be reflected at runtime. It's simply an instruction to the compiler to treat given variable differently than the original during its type checks. By doing an implicit cast, you should however be aware that such cast may potentially be unsafe.

With and without const, assuming your string is truly read only, is going to change whether it lands in .data or .rodata or some other read only section (.text, etc). Basically is it going to be in flash or in ram.
The flash on these parts if I remember right at best has an extra wait state or is basically half the speed of ram...at best. (Fairly common for mcus in general, there are exceptions though). If you are running in the slower range of clocks, of you boost the clock the the ram performance vs flash will improve. So having it in flash (const) vs sram is going to be slower for any code that parses through that string.
This assumes your linker script and bootstrap are such that .data is actually copied to ram on boot...

Related

Where does local const variable will get stored?

Where does local const variable will get stored? I have verified that, every where in function where const variable is used, get replaced with its value(like immediate value addressing mode). But if pointer is assigned to it then it gets stored on stack. Here I do not understand one thing how processor knows its constant value. Is there any read only section in stack like it present in .data section?
Generally, the processor does not know that an object is declared const in C.
Systems commonly have regions of memory that are marked read-only after a program is loaded, and static const objects are stored in such memory. For these objects, the processor enforces the read-only property.
Systems generally do not have read-only memory used for stack. This would be inherently difficult—the memory would need to be read-write when a function is starting, so that its stack frame can be constructed, but read-only at other times. So the program would be frequently changing the hardware memory protection settings. This would impair performance and is generally not considered worth while.
So programs generally have only a read-write stack available. When you declare an automatic (rather than static) const object, where can the compiler put it? As you note, it is often optimized into an immediate operand in instructions. However, when you take its address, it must have an address, so it must be in memory.
One idea might be that, since it is const, it will not chamge, so we only need one copy, so it can be stored in the static read-only section instead of on the stack. However, the C standard says that each different object has a different address. To comply with that requirement, the compiler has to create a different instance of the object in memory each time it is created in the C code. Putting it on the stack is an easy way to do this.
I think it totally depends on your tool-chain specific implementation. Variables are stored in RAM, program in Flash memory and constants either in RAM or Flash.
Correct me if I'm wrong.

Hitech C data buffers in program memory

The C18 compiler allows variables in program memory with ROM qualifier, but the Hi-Tech C seems rather reluctant to utilize the Havard architecture to its best. So is there a way to create data buffers in program memory with the Hi-Tech C compiler (I am ready to compromise access speed).
I've seen indications of possibility with the psect but don't have any working implementation.
The HI-TECH PICC18 compiler places objects declared as const into program space by default. No special qualifiers like C18's RAM/ROM are needed:
3.5.3 Objects in Program Space
const objects are usually placed in program space. On the PIC18 devices, the program space is
byte-wide, the compiler stores one character per byte location and values are read using the table
read instructions. All const-qualified data objects and string literals are placed in the const psect.
The const psect is placed at an address above the upper limit of RAM since RAM and const
pointers use this address to determine if an access to ROM or RAM is required.
Note that placing frequently updated data in the microcontroller's flash memory may not be such a good idea, as flash has a limited number of program/erase cycles.
far pointers can be used to dereference program memory:
3.4.12.2 Const and Far Pointers
const and far pointers can either be 16 or 24 bits wide. Their size can be toggled with the --CP=24
or --CP=16 command line option. The code used to dereference them also changes with their size.
The same pointer size must be used for all modules in a project.
A pointer to far is identical to a pointer to const, except that pointers to far may be used to
write to the address they hold. A pointer to const objects cannot be used to write as the const
qualifier imposes that the object is read-only.
const and far pointers which are 16 bits wide can access all RAM areas and most of the program
space. At runtime when dereferenced, the contents of the pointer are examined. For addresses above
the upper limit of RAM the program space is accessed using table read or table write instructions.
Addresses below the upper limit of RAM access the data space. Even if the address held by a pointer
to const is in RAM, the RAM location may not be changed.
The default linker options always place const data at addresses above the upper limit of the data
space so that the correct memory space is accessed when dereferencing with pointers.
If the target device selected has more than 64k bytes of program space memory, then only the
lower 64k bytes may be accessed with 16-bit wide pointers. Provided that all program space objects
that need to be dereferenced are in the lower 64k bytes, 16-bit pointers to const and far objects
may still be used. The smaller pointer size results in less RAM required and less code produced and
so should be used whenever possible.
const and far pointers which are 24 bits wide can access all RAM areas and all of the program
space. At runtime when dereferenced, the contents of the pointer are examined. If bit number 21
in the address is set, the address is assumed to be a RAM address. Bit number 21 of the address is
then ignored. If Bit number 21 is clear, then the address is assumed to be of an object in the program
space and the access is performed using table read or table write instructions. Again, no writes to
objects are permitted using a pointer to const.
Note that when dereferencing a 24-bit pointer, the most significant implemented bit (bit number
21) of the TBLPTRU register may be overwritten. This bit may be used to enable access to the
configuration area of the PIC18 device. If loading the table pointer registers from hand-written
assembler code, make no assumptions about the state of bit number 21 prior to executing table read
or write instructions.
The quotes are from HI-TECH PICC18 v9.51 manual.

const value vs. #define, which kind of chip resource will be used?

if I define a macro, or use static const value, in an embedded system,
which kind of memory will be used, chip flash or chip ram?
Which way is better?
I believe the answer is more complex.
Edit: I apologise for using 'should' and 'might', but without a specific compiler, or debugger, I find it had to be accurate and precise. Maybe if the question can say what compiler and platform is targeted, we can be clearer?
#define NAME ((type_cast)value) consumes no space until it appears in the code. The compiler may be able to deduce something using its value (compared to using a variable with an unknown run-time value), and hence might change the code generated so that it effectively consumes no space, or may even reduce the size of code. If the compiler's analysis is that the literal value will be needed at run time, then it consume code space. The literal value is known, so the compiler should be able to allocate the optimum amount of space. Depending on the processor, it should be stored in flash, but might not be in-line code, but instead in a 'literal pool', a set of local variables, typically near to the code so compact addresses mght be used. The compiler will likely make good decisions.
static const type name = value; should not consume space until it is used in the code. Even when it is used in code, it might or might not consume 'space' depending on your compiler (and, I think, the C standard it is compiling) and how the code uses the value.
If the address of the name is never taken, then the compiler does not have to store it. If the address of the value is taken (and that code is not eliminated) then the value must be in memory. Smart compilers will detect whether or not any code in the source file takes its address. Even though it might be stored, the compiler might generate better (faster or more compact code) by not using the stored value.
The compiler might do as good a job as #define NAME though it might be worse, than #define.
If the value had its address taken, then the compiler treats the variable as an initialised variable, which consumes space to store the constant value. The compiler doesn't really put values into RAM or flash. That depends on the linker. In gcc, there are 'attributes' which can be used to tell the linker which segment to put a variable into. By default the compiler puts initialised variables into the default data segment, and initialised const into a read-only segment. By using attributes, the programmer can put variables into any segment. Using an appropriate linker script, which usually comes with the toolchain, segments can be put in flash. Gcc uses the readonly data segment for data like literal strings.
name should be available in a debugger, but the #define NAME will not.
There is a third approach, which is to use enum's:
enum CONSTANTS { name = 1234, height = 456 ... };
These may be treated by the compiler like #define constants though they are not quite as flexible because they are int size (IIRC). There is no way to take the address of an enum value, so the compiler has as many options to generate good code ad a #define NAME. They will often be available in the debugger.
const type name = value; may consume RAM. It must be in memory because the compiler can't know if a code in a different file uses it, or takes its address (but gcc LTO might change that) The const tells the compiler to 'warn' (or 'error) where any code tries to change the value, e,g, using an assignment operator. Normally variables held in RAM are stored in the data or bss memory segments. By default gcc puts const into a readonly segment, the segment can set using the command line option -mrodata=readonly-data-section. that segment is .rodata on ARM.
On embedded systems, all initialised global and static variables (const or not) are also held in flash, and copied to RAM when the program starts (before main() is called). All uninitialised global or static variables are set to 0 before main() is called.
The compiler might put const variables into their own memory segment (gcc does), which may allow a linker (e.g. ld) script to put them into flash, and not allocate any RAM to them (this wouldn't work on e.g. AVR ATmega which use different imstructions to load data from flash).
Well, if you #define a macro, no additional memory or code space (flash) allocated for it. All job done in compile stage.
If you use a static const global variable, binary codes will generated for initial value and memory allocated for it. both flash (bin file size bigger) and memory (chip ram) used.
In addition to what other said:
using #define tells nothing about the variable. Define itself does not need any, but if you do something like int x = MY_DEFINE it will of course use one, and it will be non const.
On some toolchains/systems you can actually get const variables into some special section, you can put into FLASH/ROM, typically using custom linker script/compiler switches.

How do global variables contribute to the size of the executable?

Does having global variables increase the size of the executable? If yes how? Does it increase only the data section size or also the text section size?
If I have a global variable and initialization as below:
char g_glbarr[1024] = {"jhgdasdghaKJSDGksgJKASDGHKDGAJKsdghkajdgaDGKAjdghaJKSDGHAjksdghJKDG"};
Now, does this add 1024 to data section and the size of the initilization string to text section?
If instead if allocating space for this array statically, if I malloc it, and then do a memcpy, only the data section size will reduce or the text section size also will reduce?
Yes, it does. Basically compilers store them to data segment. Sometimes if you use a constant char array in you code (like printf("<1024 char array goes here");) it will go to data segment (AFAIK some old compilers /Borland?/ may store it in the text segment). You can force the compiler to put a global variable in a custom section (for VC++ it was #pragma data_seg(<segment name>)).
Dynamic memory allocation doesn't affect data/text segments, since it allocates memory in the heap.
The answer is implementation-dependent, but for sane implementations this is how it works for variables with static storage duration (global or otherwise):
Whenever the variable is initialized, the whole initialized value of the object will be stored in the executable file. This is true even if only the initial part of it is explicitly initialized (the rest is implicitly zero).
If the variable is constant and initialized, it will be in the "text" segment, or equivalent. Some systems (modern ELF-based, maybe Windows too?) have a separate "rodata" segment for read-only data to allow it to be marked non-executable, separate from program code.
Non-constant initialized variables will be in the "data" segment in the executable, which is mapped into memory in copy-on-write mode by the operating system when the program is loaded.
Uninitialized variables (which are implicitly zero as per the standard) will have no storage reserved in the executable itself, but a size and offset in the "bss" segment, which is created at program load-time by the operating system.
Such uninitialized variables may be created in a separate read-only "bss"-like segment if they're const-qualified.
I am not speaking as an expert, but I would guess that simply having that epic string literal in your program would increase the size of your executable. What you do with that string literal doesn't matter, because it has to be stored somewhere.
Why does it matter which "section" of the executable is increased? This isn't a rhetorical question!
The answer is slightly implementation sensitive, but in general, no. Your g_glbarr is really a pointer to char, or an address. The string itself will be put into the data section with constant strings, and g_glbarr will become a symbol for the address of the string at compile time. You don't end up allocating space for the pointer and the compiler simply resolves the address at link time.
Update
#Jay, it's sorta kinda the same. The integers (usually) just are in-line: the compiler will come as close as it can to just putting the constant in the code, because that's such a common case that most normal architectures have a straightforward way of doing it from immediate data. The string constants will still be in some read-only data section. So when you make something like:
// warning: I haven't compiled this and wouldn't normally
// do it quite this way so I'm not positive this is
// completely grammatical C
struct X {int a; char * b; } x = { 1, "Hello" } ;
the 1 becomes "immediate" data, the "Hello" is allocated in read-only data somewhere, and the compiler will just generate something that allocates a piece of read-write data that looks something like
x:
x.a: WORD 1
x.b WORD #STR42
where STR42 is a symbolic name for the location of the string "Hello" in memory. Then when everything is linked together, the #STR42 is replaced with the actual virtual address of the string in memory.

When do I use xdata?

I am new at embedded system programming. I am working on a device that uses an 8051 chipset. I have noticed in the sample programs that when defining variables, sometimes they use the keyword xdata. like this...
static unsigned char xdata PatternSize;
while other times the xdata keyword is omitted.
My understanding is that the xdata keyword instructs the compiler that that variable is to be stored in external, flash, memory.
In what cases should I store variables externally with xdata? Accessing those variables takes longer, right? Values stored using xdata do not remain after a hard reset of the device do they?
Also, I understand that the static keyword means that the variable will persist through every call to the function it is defined in. Do static and xdata have to be used together?
The 8051 architecture has three separate address spaces, the core RAM uses an 8 bit address, so can be up to 256 bytes, XDATA is a 16bit address space (64Kbytes) with read/write capability, and the program space is a 16bit address space with execution and read-only data capability. Because of its small address range and close coupling to the core, addressing the core RAM is more efficient in terms of code space and access cycles
The original 8051 core had tiny-on-chip RAM (an address space of 256 bytes but some variants had half that in actual memory), and XDATA referred to off-chip data memory (as opposed to program memory). However most modern 8051 architecture devices have on-chip XDATA and program memory.
So you might use the core memory when performance is critical and XDATA for larger memory objects. However the compiler should in most cases make this decision for you (check your compilr's manual, it will describe in detail how memory is allocated). The instruction set makes it efficient to implement the stack in core memory, whereas static and dynamically allocated data would usually be more sensibly allocated in XDATA. If the compiler has an XDATA keyword, then it will override the compiler's strategy, and should only be used when the compiler's strategy somehow fails since it will reduce the portability of the code.
[edit] Note also that the core memory includes a 32byte bit-addressable region, the bit-addressing instructions use an 8bit address into this region to access individual bits directly. The region exists within the 256byte byte addressable core memory, so is both bit and byte addressable[/edit]
xdata tells the compiler that the data is stored in external RAM so it has to use a different instruction to read and write that memory instead of internal RAM.
Accessing external data does take longer. I usually put interrupt variables in internal RAM and most large arrays in external RAM.
As to the state of the external RAM after a hard reset (not power cycle): That would depend on the hardware setup. Does a reset line go to the external chip? Also some chips come with XDATA within the CPU chip. Read that again. Some chips have an 8051 CPU plus some amount of XDATA within the IC.
static and xdata do not overlap. Static tells the compiler how to allocate a variable (on a stack or at a memory location). Xdata tells the compiler how to get to that variable. Static can also restrict the name space of that variable to just that file. You can have an xdata static variable that is local to just a function, and have a static variable that is local to a function but uses internal RAM.
An important point not yet mentioned is that because different instructions are used to access different memory areas, the hardware has no unified concept of a "pointer". Any address which is known to be in DATA/IDATA space may be uniquely identified with a one-byte pointer; likewise any address which is known to be in PDATA space. Any address which is known to be in CODE space may be identified with a two-byte pointer; likewise any address which is known to be in XDATA space. In many cases, though, a routine like memcpy won't know in advance which memory space should be used with the passed-in pointers. To accommodate that, 8x51 compilers generally use a three-byte pointer type which may be used to access things in any memory space (one byte selects which type of instructions should be used with the pointer, and the other bytes hold the value). A pointer declaration like:
char *ptr;
will define a three-byte pointer which can point to any memory space. Changing the declaration to
char xdata *data ptr;
will define a two-byte pointer which is stored in DATA space, but which can only point to things in the XDATA space. Likewise
char data * data ptr;
will define a two-byte pointer which is stored in DATA space, but which can only point to things in the DATA and IDATA spaces. Code which uses pointers that point to a known data space will be much faster (possibly by a factor of ten) than code which uses the "general-purpose" three-byte pointers.
How and when to use xData memory area depends on the system architecture. Some systems may have RAM at this address while others could have ROM or Flash. In either case, access will be slower than accessing internal RAM, ROM or Flash.
Generally speaking, large items, constant items and lesser used items should go into xData. There are no standard rules as to what goes in xData, as it depends on the architecture.
The 8051 has a 128 byte range of scratch pad "pseudo-registers" that (most) compilers use as the default for declared variables. But obviously this area is very small, and you want to be able to put variables in the 16 bit memory address space too. That's what the xdata (i.e. "external data") specifier is for. What to put where depends, obviously, on what the data is and how you plan on using it.
Basically, I think this is the wrong question. You need to understand your CPU architecture first before learning how to use the C compiler's 8051-specific features.

Resources