How do global variables contribute to the size of the executable? - c

Does having global variables increase the size of the executable? If yes how? Does it increase only the data section size or also the text section size?
If I have a global variable and initialization as below:
char g_glbarr[1024] = {"jhgdasdghaKJSDGksgJKASDGHKDGAJKsdghkajdgaDGKAjdghaJKSDGHAjksdghJKDG"};
Now, does this add 1024 to data section and the size of the initilization string to text section?
If instead if allocating space for this array statically, if I malloc it, and then do a memcpy, only the data section size will reduce or the text section size also will reduce?

Yes, it does. Basically compilers store them to data segment. Sometimes if you use a constant char array in you code (like printf("<1024 char array goes here");) it will go to data segment (AFAIK some old compilers /Borland?/ may store it in the text segment). You can force the compiler to put a global variable in a custom section (for VC++ it was #pragma data_seg(<segment name>)).
Dynamic memory allocation doesn't affect data/text segments, since it allocates memory in the heap.

The answer is implementation-dependent, but for sane implementations this is how it works for variables with static storage duration (global or otherwise):
Whenever the variable is initialized, the whole initialized value of the object will be stored in the executable file. This is true even if only the initial part of it is explicitly initialized (the rest is implicitly zero).
If the variable is constant and initialized, it will be in the "text" segment, or equivalent. Some systems (modern ELF-based, maybe Windows too?) have a separate "rodata" segment for read-only data to allow it to be marked non-executable, separate from program code.
Non-constant initialized variables will be in the "data" segment in the executable, which is mapped into memory in copy-on-write mode by the operating system when the program is loaded.
Uninitialized variables (which are implicitly zero as per the standard) will have no storage reserved in the executable itself, but a size and offset in the "bss" segment, which is created at program load-time by the operating system.
Such uninitialized variables may be created in a separate read-only "bss"-like segment if they're const-qualified.

I am not speaking as an expert, but I would guess that simply having that epic string literal in your program would increase the size of your executable. What you do with that string literal doesn't matter, because it has to be stored somewhere.
Why does it matter which "section" of the executable is increased? This isn't a rhetorical question!

The answer is slightly implementation sensitive, but in general, no. Your g_glbarr is really a pointer to char, or an address. The string itself will be put into the data section with constant strings, and g_glbarr will become a symbol for the address of the string at compile time. You don't end up allocating space for the pointer and the compiler simply resolves the address at link time.
Update
#Jay, it's sorta kinda the same. The integers (usually) just are in-line: the compiler will come as close as it can to just putting the constant in the code, because that's such a common case that most normal architectures have a straightforward way of doing it from immediate data. The string constants will still be in some read-only data section. So when you make something like:
// warning: I haven't compiled this and wouldn't normally
// do it quite this way so I'm not positive this is
// completely grammatical C
struct X {int a; char * b; } x = { 1, "Hello" } ;
the 1 becomes "immediate" data, the "Hello" is allocated in read-only data somewhere, and the compiler will just generate something that allocates a piece of read-write data that looks something like
x:
x.a: WORD 1
x.b WORD #STR42
where STR42 is a symbolic name for the location of the string "Hello" in memory. Then when everything is linked together, the #STR42 is replaced with the actual virtual address of the string in memory.

Related

In C, how do I make a string with non-static duration time?

I have a string, that is only used once when my application launches. Ordinary string literals, eg. "Hello" are static, meaning they're only deallocated when the program ends. I don't want that. They can be deallocated earlier. How do I say, Hey, like, this string literal shouldn't be static. It should be deallocated when the scope ends. How do I do that? For example,
memcpy(GameDir+HomeDirLen, "/.Data", 7);
The "/.Data" is still stored in ram as the literal even long after the this line of code runs. That's a waste, because it's only used once.
With typical implementations, if your program contains the string "/.Data" anywhere, either as a literal or as an initializer for an array of any duration, then the program is going to contain those bytes somewhere in the executable. They'll be loaded (or mapped) into memory when the program loads, and I don't know of any implementation that can free such memory before the program exits. So the other answers so far don't really accomplish what you want.
(If your array was of auto duration, then initializing is typically done under the hood by copying from an anonymous static string. Or it could be done by a sequence of immediate store instructions, which probably uses even more memory.)
So if you really want to ensure that those bytes don't occupy memory for the life of the process, you'll have to get them from somewhere other than the program itself. For instance, you could store the string in a file, open it, and read the string into an auto or malloced array. Then you really will recover the memory when the array goes out of scope or is freed (assuming, of course, that free actually does recover memory in a way that's useful to you). You could also use mmap if your system provides it.
On the other hand, modern operating systems usually have virtual memory. So if your string literal is in the read-only data section of the program, then if physical memory becomes tight, the system can simply drop that page of physical memory and use it for something else. If your program should attempt to access that data again, the system will allocate a new page and transparently populate it from the executable file on disk - but if you never access it, that will never happen.
Of course this doesn't help much if your string is really only 7 bytes, because there will be lots of other stuff in that page of memory (a page is commonly 4KB or somewhere around there). But if your string is really big, or you have a lot of such strings, then this effect may work just as well as actually freeing the memory. You may even be able to use various compiler-specific options to ensure that all your only-needed-once strings are placed contiguously in the executable, so that they will all be in the same pages of memory.
I have a string, that is only used once when my application launches.
Ordinary string literals, eg. "Hello" are static, meaning they're only
deallocated when the program ends. I don't want that. They can be
deallocated earlier. How do I say, Hey, like, this string literal
shouldn't be static. It should be deallocated when the scope ends. How
do I do that?
You cannot. All string literals have static storage duration, and that's really the only way they could work. If you have a string literal in your program source that is used in any way, then the program image has to contain the bytes of the literal's representation somewhere among the program data. If the literal appears inside a function, as must be the case in your example, then the representation needs to be retained for use each time the function is called. Similar applies to uses at file scope: string literals used there typically are accessible for the entire run of the program.
The exception is string literals used as initializers for (other) character arrays with static storage duration. Such an initialization results in, initially, two identical copies of the same data, at most one of which is actually accessible at run time. There's no use for retaining the data for the literal separately. C does not specify a way for you to express that the literal should not be retained, but your compiler is at liberty to omit the unneeded duplicate at its own discretion, and at least some do.
Compilers may also fold identical string literals, and perhaps even fold literals that just have identical tails, and / or perform other space-saving optimizations. And your compiler is likely to be better than you are at recognizing when and how such optimizations can safely be performed.
This answer does not really match your specific need. I'll leave it for the comments.
You can use a compound literal ... and a pointer to it
char *p = (char[]){"Hello!"}; // vs char *p = "Hello!"
*p = 'C'; // *p = 'C'; // illegal
see https://ideone.com/62UNKO

Peculiar memory allocation of global variables by c compiler

When I am declaring some variable outside main then compile stores them in some peculiar way.
int i=1,j=1;
void main(void)
{
printf("%d\n%d",&i,&j);
}
If both i and j are not initialized or equals 0 or equals some positive values then they are stored at continuous address spaces in memory whereas if i=0 and j = some +ve integer then their addresses are separated by fairly large distance.
The problem with is when they are stored on contiguous address spaces it causes some real performance issues like false sharing (have a look here). I've learned that to prevent this, there should be some space between variable's addresses which is automatically provided when i=0 and j=any +ve value.
Now, what I want to understand is:
Why the compiler stores variables to noncontinuous addresses only when one initialized to 0 and other initialized to positive values, and
How can I intentionally do what compiler is doing automatically i.e allocating variables to fairly separated address space.
(Using devcpp gcc 4.9.2)
Assuming you meant printf("%p, %p\n",(void *)&i,(void *)&j);, note the following:
It is not mandated by C specs to allocate variables in contiguous memory.
Often globals initialized with 0 are kept in BSS section (which is a part of data section) to save binary size. Other globals are kept in rest of the data section. (Depends on implementation detail, not mandated by C specs)
How can I intentionally do what compiler is doing automatically?
This is compiler specific question and your compiler documentation should possibly contain an answer to this.
One problem there,
printf("%d\n%d",&i,&j);
invokes undefined behavior. So, the outputs cannot be justified in any way. You need to use %p format specifier and cast the corresponding argument to (void *) to print a pointer.
That said, C standard does neither impose any constraints nor provide any guideline on where and how the variables will be stored in memory. It's up to the compiler implementation to decide how to place different variables in memory. You need to check the documentation of the compiler in use to find out the rules your compiler is following.
To elaborate in a generic way, an object file consists of many segments, like
Header (descriptive and control information)
Code segment ("text segment", executable code)
Data segment (initialized static variables)
Read-only data segment (rodata, initialized static constants)
BSS segment (uninitialized static data, both variables and constants)
External definitions and references for linking
Relocation information
Dynamic linking information
Debugging information
and it's up to the compiler to decide the address space (range/value) to be used for each segment.
As per the rules,
Global variables (i.e., having static storage duration) left uninitialized and initialized with 0 are placed in .bss segment.
Variables initialized with a non-zero value are placed in the .data segment
so, it's fair enough to say that the addresses of two variables pertaining to two different segments will not be contiguous.
Now, your observation checks out.
If both i and j are not initialized or equals 0 or equals some positive values then they are stored at continuous address spaces in memory
yes, then all of them go to either .bss or .data and compiler choose to place them one after another, usually.
whereas if i=0 and j = some +ve integer then their addresses are separated by fairly large distance.
This also holds true, both the variables are now placed in different segments.

How are const char and char pointers represented in memory in STM32

How does MCU now that the string a variable is pointing on is in data memory or in program memory?
What does compiler do when I'm casting a const char * to char * (e.g. when calling strlen function)?
Can char * be used as a char * and const char * without any performance loss?
The STM32s use a flat 32-bit address space, so RAM and program memory (flash) are in the same logical space.
The Cortex core of course knows which type of memory is where, probably through hardware address decoders that are triggered by the address being accessed. This is of course way outside the scope of what C cares about, though.
Dropping const is not a run-time operation, so there should be no performance overhead. Of course dropping const is bad, since somewhere you risk someone actually believing that a const pointer means data there won't be written to, and going back on that promise can make badness happen.
By taking a STM32F4 example with 1MB flash/ROM memory and 192KB RAM memory (128KB SDRAM + 64KB CCM) - the memory map looks something as follows:
Flash/ROM - 0x08000000 to 0x080FFFFF (1MB)
RAM - 0x20000000 to 0x2001FFFF (128KB)
There's more areas with separate address spaces that I won't cover here for the simplicity of the explanation. Such memories include Backup SRAM and CCM RAM, just to name two. In addition, each area may be further divided sections, such as RAM being divided to bss, stack and heap.
Now onto your question about strings and their locations - constant strings, such as:
const char *str = "This is a string in ROM";
are placed in flash memory. During compilation, the compiler places a temporary symbol that references such string. Later during linking phase, the linker (which knows about concrete values for each memory section) lays down all of your data (program, constant data etc.) in each section one after another and - once it knows concrete values of each such object - replaces those symbols placed by the compiler with concrete values which then appear in your binary. Because of this, later on during runtime when the assignment above is done, your str variable is simply assigned a constant value deduced by the linker (such as 0x08001234) which points directly to the first byte of the string.
When it comes to dynamically allocated values - whenever you call malloc or new a similar task is done. Assuming sufficient memory is available, you are given the address to the requested chunk of memory in RAM and those calculations are during runtime.
As for the question regarding const qualifier - there is not meaning to it once the code is executed. For example, during runtime the strlen function will simply go over memory byte-by-byte starting at the passed location and ending once binary 0 is encountered. It doesn't matter what "type" of bytes are being analyzed, because this information is lost once your code is converted to byte code. Regarding const in your context - const qualifier appearing in function parameter denotes that such function will not modify the contents of the string. If it attempted to, a compilation error would be raised, unless it implicitly performs a cast to a non-const type. You may, of course, pass a non-const variable as a const parameter of a function. The other way however - that is passing a const parameter to a non-const function - will raise an error, as this function may potentially modify the contents of the memory you point to, which you implicitly specified to be non-modifiable by making it const.
So to summarize and answer your question: you can do casts as much as you want and this will not be reflected at runtime. It's simply an instruction to the compiler to treat given variable differently than the original during its type checks. By doing an implicit cast, you should however be aware that such cast may potentially be unsafe.
With and without const, assuming your string is truly read only, is going to change whether it lands in .data or .rodata or some other read only section (.text, etc). Basically is it going to be in flash or in ram.
The flash on these parts if I remember right at best has an extra wait state or is basically half the speed of ram...at best. (Fairly common for mcus in general, there are exceptions though). If you are running in the slower range of clocks, of you boost the clock the the ram performance vs flash will improve. So having it in flash (const) vs sram is going to be slower for any code that parses through that string.
This assumes your linker script and bootstrap are such that .data is actually copied to ram on boot...

How a pointer initialization C statement in global space gets its assigned value during compile/link time?

The background of this question is to understand how the compiler/linker deals with the pointers when it is initialized in global space.
For e.g.
#include <stdio.h>
int a = 8;
int *p = &a;
int main(void) {
printf("Address of 'a' = %x", p);
return 0;
}
Executing the above code prints the exact address for a.
My question here is, during at which process (compile? or linker?) the pointer p gets address of a ? It would be nice if your explanation includes equivalent Assembly code of the above program and how the compiler and linker deals with pointer assignment int *p = &a; in global space.
P.S: I could find lot of examples when the pointer is declared and initialized in local scope but hardly for global space.
Thanks in advance!
A module is linked (often named crt0.o) along with your program code, which is responsible for setting up the environment for a C program. There will be global and static variables initialized which is executed before main is called.
The actual address of the global variables are determined by the operating system, when it loads an executable and performs the necessary relocations so that the new process can be executed.
To run a program, the system has to load it into RAM. So it creates one huge memory block containing the actual compiled instructions. This block usually also contains a "data section" which contains strings etc. If you declare a global variable, what compilers usually do is reserve space for that variable in such a data section (there's usually several, non-writable ones for strings, and writable ones for globals etc.).
Whenever you reference the global, it just records the offset from the current instruction to that global. So an instruction can just calculate [current instruction address] + [offset] to get at the global, wherever it ended up being loaded. Since space in the data section has been reserved in the file anyway, they can write any (constant) value in there you want, and it will get loaded with the rest of the code.
This is how it works in C, and is why C only allows constants. C++ works like Devolus wrote, where there is extra code that is run before main(). Effectively they rename the main function and give you a function that does the setup, then calls your main function. This allows C++ to call constructors.
There are also some optimizations like, if a global is initialized to zero, it usually just gets an offset in a "zero" section that doesn't exist in the file. The file just says: "After this code, I want 64 bytes of zeroes". That way, your file doesn't waste space on disk with hundreds of "empty" bytes.
It gets a tad more complicated if you have dynamically loaded libraries (dylibs or DLLs), where you have two segments loaded into separate memory blocks. Since neither knows where in RAM the other one ended up, the executable file contains a list of name -> offset mappings. When you load a library, the loader looks up the symbol in the (already loaded) other library and calculates the actual address at which e.g. the global is at, before main() is called (and before any of the constructors run).

Where are constant variables stored in C?

I wonder where constant variables are stored. Is it in the same memory area as global variables? Or is it on the stack?
How they are stored is an implementation detail (depends on the compiler).
For example, in the GCC compiler, on most machines, read-only variables, constants, and jump tables are placed in the text section.
Depending on the data segmentation that a particular processor follows, we have five segments:
Code Segment - Stores only code, ROM
BSS (or Block Started by Symbol) Data segment - Stores initialised global and static variables
Stack segment - stores all the local variables and other informations regarding function return address etc
Heap segment - all dynamic allocations happens here
Data BSS (or Block Started by Symbol) segment - stores uninitialised global and static variables
Note that the difference between the data and BSS segments is that the former stores initialized global and static variables and the later stores UNinitialised ones.
Now, Why am I talking about the data segmentation when I must be just telling where are the constant variables stored... there's a reason to it...
Every segment has a write protected region where all the constants are stored.
For example:
If I have a const int which is local variable, then it is stored in the write protected region of stack segment.
If I have a global that is initialised const var, then it is stored in the data segment.
If I have an uninitialised const var, then it is stored in the BSS segment...
To summarize, "const" is just a data QUALIFIER, which means that first the compiler has to decide which segment the variable has to be stored and then if the variable is a const, then it qualifies to be stored in the write protected region of that particular segment.
Consider the code:
const int i = 0;
static const int k = 99;
int function(void)
{
const int j = 37;
totherfunc(&j);
totherfunc(&i);
//totherfunc(&k);
return(j+3);
}
Generally, i can be stored in the text segment (it's a read-only variable with a fixed value). If it is not in the text segment, it will be stored beside the global variables. Given that it is initialized to zero, it might be in the 'bss' section (where zeroed variables are usually allocated) or in the 'data' section (where initialized variables are usually allocated).
If the compiler is convinced the k is unused (which it could be since it is local to a single file), it might not appear in the object code at all. If the call to totherfunc() that references k was not commented out, then k would have to be allocated an address somewhere - it would likely be in the same segment as i.
The constant (if it is a constant, is it still a variable?) j will most probably appear on the stack of a conventional C implementation. (If you were asking in the comp.std.c news group, someone would mention that the standard doesn't say that automatic variables appear on the stack; fortunately, SO isn't comp.std.c!)
Note that I forced the variables to appear because I passed them by reference - presumably to a function expecting a pointer to a constant integer. If the addresses were never taken, then j and k could be optimized out of the code altogether. To remove i, the compiler would have to know all the source code for the entire program - it is accessible in other translation units (source files), and so cannot as readily be removed. Doubly not if the program indulges in dynamic loading of shared libraries - one of those libraries might rely on that global variable.
(Stylistically - the variables i and j should have longer, more meaningful names; this is only an example!)
Depends on your compiler, your system capabilities, your configuration while compiling.
gcc puts read-only constants on the .text section, unless instructed otherwise.
Usually they are stored in read-only data section (while global variables' section has write permissions). So, trying to modify constant by taking its address may result in access violation aka segfault.
But it depends on your hardware, OS and compiler really.
offcourse not , because
1) bss segment stored non inilized variables it obviously another type is there.
(I) large static and global and non constants and non initilaized variables it stored .BSS section.
(II) second thing small static and global variables and non constants and non initilaized variables stored in .SBSS section this included in .BSS segment.
2) data segment is initlaized variables it has 3 types ,
(I) large static and global and initlaized and non constants variables its stord in .DATA section.
(II) small static and global and non constant and initilaized variables its stord in .SDATA1 sectiion.
(III) small static and global and constant and initilaized OR non initilaized variables its stord in .SDATA2 sectiion.
i mention above small and large means depents upon complier for example small means < than 8 bytes and large means > than 8 bytes and equal values.
but my doubt is local constant are where it will stroe??????
This is mostly an educated guess, but I'd say that constants are usually stored in the actual CPU instructions of your compiled program, as immediate data. So in other words, most instructions include space for the address to get data from, but if it's a constant, the space can hold the value itself.
This is specific to Win32 systems.
It's compiler dependence but please aware that it may not be even fully stored. Since the compiler just needs to optimize it and adds the value of it directly into the expression that uses it.
I add this code in a program and compile with gcc for arm cortex m4, check the difference in the memory usage.
Without const:
int someConst[1000] = {0};
With const:
const int someConst[1000] = {0};
Global and constant are two completely separated keywords. You can have one or the other, none or both.
Where your variable, then, is stored in memory depends on the configuration. Read up a bit on the heap and the stack, that will give you some knowledge to ask more (and if I may, better and more specific) questions.
It may not be stored at all.
Consider some code like this:
#import<math.h>//import PI
double toRadian(int degree){
return degree*PI*2/360.0;
}
This enables the programmer to gather the idea of what is going on, but the compiler can optimize away some of that, and most compilers do, by evaluating constant expressions at compile time, which means that the value PI may not be in the resulting program at all.
Just as an an add on ,as you know that its during linking process the memory lay out of the final executable is decided .There is one more section called COMMON at which the common symbols from different input files are placed.This common section actually falls under the .bss section.
Some constants aren't even stored.
Consider the following code:
int x = foo();
x *= 2;
Chances are that the compiler will turn the multiplication into x = x+x; as that reduces the need to load the number 2 from memory.
I checked on x86_64 GNU/Linux system. By dereferencing the pointer to 'const' variable, the value can be changed. I used objdump. Didn't find 'const' variable in text segment. 'const' variable is stored on stack.
'const' is a compiler directive in "C". The compiler throws error when it comes across a statement changing 'const' variable.

Resources