C memory management in gcc - c

I am using gcc version 4.7.2 on Ubuntu 12.10 x86_64.
First of all these are the sizes of data types on my terminal:
sizeof(char) = 1
sizeof(short) = 2 sizeof(int) = 4
sizeof(long) = 8 sizeof(long long) = 8
sizeof(float) = 4 sizeof(double) = 8
sizeof(long double) = 16
Now please have a look at this code snippet:
int main(void)
{
char c = 'a';
printf("&c = %p\n", &c);
return 0;
}
If I am not wrong we can't predict anything about the address of c. But each time this program gives some random hex address ending in f. So the next available location will be some hex value ending in 0.
I observed this pattern in case of other data types too. For an int value the address was some hex value ending in c. For double it was some random hex value ending in 8 and so on.
So I have 2 questions here.
1) Who is governing this kind of memory allocation ? Is it gcc or C standard ?
2) Whoever it is, Why it's so ? Why the variable is stored in such a way that next available memory location starts at a hex value ending in 0 ? Any specific benefit ?
Now please have a look at this code snippet:
int main(void)
{
double a = 10.2;
int b = 20;
char c = 30;
short d = 40;
printf("&a = %p\n", &a);
printf("&b = %p\n", &b);
printf("&c = %p\n", &c);
printf("&d = %p\n", &d);
return 0;
}
Now here what I observed is completely new for me. I thought the variable would get stored in the same order they are declared. But No! That's not the case. Here is the sample output of one of random run:
&a = 0x7fff8686a698
&b = 0x7fff8686a694
&c = 0x7fff8686a691
&d = 0x7fff8686a692
It seems that variables get sorted in increasing order of their sizes and then they are stored in the same sorted order but with maintaining the observation 1. i.e. the last variable (largest one) gets stored in such a way that the next available memory location is an hex value ending in 0.
Here are my questions:
3) Who is behind this ? Is it gcc or C standard ?
4) Why to waste the time in sorting the variables first and then allocating the memory instead of directly allocating the memory on 'first come first serve' basis ? Any specific benefit of this kind of sorting and then allocating memory ?
Now please have a look at this code snippet:
int main(void)
{
char array1[] = {1, 2};
int array2[] = {1, 2, 3};
printf("&array1[0] = %p\n", &array1[0]);
printf("&array1[1] = %p\n\n", &array1[1]);
printf("&array2[0] = %p\n", &array2[0]);
printf("&array2[1] = %p\n", &array2[1]);
printf("&array2[2] = %p\n", &array2[2]);
return 0;
}
Now this is also shocking for me. What I observed is that the array is always stored at some random hex value ending in '0' if the elements of an array >= 2 and if elements < 2
then it gets memory location following observation 1.
So here are my questions:
5) Who is behind this storing an array at some random hex value ending at 0 thing ? Is it gcc or C standard ?
6) Now why to waste the memory ? I mean array2 could have been stored immediately after array1 (and hence array2 would have memory location ending at 2). But instead of that array2 is stored at next hex value ending at 0 thereby leaving 14 memory locations in between. Any specific benefits ?

The address at which the stack and the heap start is given to the process by the operating system. Everything else is decided by the compiler, using offsets that are known at compile time. Some of these things may follow an existing convention followed in your target architecture and some of these do not.
The C standard does not mandate anything regarding the order of the local variables inside the stack frame (as pointed out in a comment, it doesn't even mandate the use of a stack at all). The standard only bothers to define order when it comes to structs and, even then, it does not define specific offsets, only the fact that these offsets must be in increasing order. Usually, compilers try to align the variables in such a way that access to them takes as few CPU instructions as possible - and the standard permits that, without mandating it.

Part of the reasons are mandated by the application binary interface (ABI) specifications for your system & processor.
See the x86 calling conventions and the SVR4 x86-64 ABI supplement (I'm giving the URL of a recent copy; the latest original is surprisingly hard to find on the Web).
Within a given call frame, the compiler could place variables in arbitrary stack slots. It may try (when optimizing) to reorganize the stack at will, e.g. by decreasing alignment constraints. You should not worry about that.
A compiler try to put local variables on stack location with suitable alignment. See the alignof extension of GCC. Where exactly the compiler put these variables is not important, see my answer here. (If it is important to your code, you really should pack the variables in a single common local struct, since each compiler, version and optimization flags could do different things; so don't depend on that precise behavior of your particular compiler).

Related

Pointer arithmetic on "neighbour" declared variables

Could someone explain why the following code behaves differently if the 3 first printf calls are there or not ?
#include <stdlib.h>
#include <stdio.h>
int main(){
int a = 1;
int b = 2;
int c = 3;
int* a_ptr = &a;
// comment out the following 3 lines
printf(" &a : %p\n", &a);
printf(" &b : %p\n", &b);
printf(" &c : %p\n", &c);
printf("*(a_ptr) : %d\n", *(a_ptr));
printf("*(a_ptr-1) : %d\n", *(a_ptr-1));
printf("*(a_ptr-2) : %d\n", *(a_ptr-2));
I was playing around to learn how variables are being stacked in memory depending on the order of declaration. By printing the addresses, I see they are 1 int size apart if they are declared after each other. After printing the addreses, I just subtract 1 and 2 from the address of a and dereference it. When I print it, it shows what i'd expect, the values of a, b and c. BUT, if i do not print the addresses before, I just get a=1, while b and c seem random to me. What is going on here ? Do you get the same behaviour on your machine ?
// without printing the addresses
*(a_ptr) : 1
*(a_ptr-1) : 4200880
*(a_ptr-2) : 6422400
// with printing
&a : 0061FF18
&b : 0061FF14
&c : 0061FF10
*(a_ptr) : 1
*(a_ptr-1) : 2
*(a_ptr-2) : 3
The C standard does not define what happens in the code you show, notably because the address arithmetic a_ptr-1 and a_ptr-2 is not defined. For an array of n elements, pointer arithmetic is defined only for adjusting pointers to locations from that corresponding to index 0 to that corresponding to index n (which is one beyond the last element of the array). For a single object, pointer arithmetic is defined as if it were an array of one element, so only for adjusting a pointer to the locations corresponding to index 0 and index 1 (just beyond the object). a_ptr-1 and a_ptr-2 would point to elements at indices −1 and −2, and the C standard does not define the behavior for these.
However, what is happening in the experiments you tried is:
When the addresses of a, b, and c are printed, the compiler has to ensure there are addresses for it to print. So it assigns memory locations to a, b, and c. It happens that these are consecutive in memory and in reverse order, and therefore a_ptr-1 happened to point to b and a_ptr-2 happened to point to c. (However, compilers may assign locations in memory based on alphabetical order of the names rather than declaration order, based on alignment and size and other properties, based on some arbitrary hash it uses to organize the names in a data structure, and/or other factors. In particular, if you compiled without optimization, and you request high optimization instead, the order may change.)
When the addresses are not printed, the compiler has no need to assign memory locations for b and c because they are not used in the code. In this case, a_ptr-1 and a_ptr-2 do not point to b and c but point to locations in memory that have been used for other purposes.

Why printf() gives a runtime error?

Why does this code below compiles and executes but doesn't print anything in output,
int i = 0;
printf(i = 0);
but this gives a runtime error,
int i = 0;
printf(i = 1);
int i = 0;
printf(i = 0);
The first argument to printf must be a char* value pointing to a format string. You gave it an int. That's the problem. The difference in behavior between printf(i = 0) and printf(i = 1) is largely irrelevant; both are equally wrong. (It's possible that the first passes a null pointer, and that printf detects and handles null pointers somehow, but that's a distraction from the real problem.)
If you wanted to print the value of i = 0, this is the correct way to do it:
printf("%d\n", i = 0);
You have a side effect in the argument (i = 0 is an assignment, not a comparison), which is legal but poor style.
If you have the required #include <stdio.h>, then your compiler must at least warn you about the type mismatch.
If you don't have #include <stdio.h>, then your compiler will almost certainly warn about calling printf without a declaration. (A C89/C90 compiler isn't strictly required to warn about this, but any decent compiler should, and a C99 or later compiler must.)
Your compiler probably gave you one or more warnings when you compiled your code. You failed to include those warnings in your question. You also failed to show us a complete self-contained program, so we can only guess whether you have the required #include <stdio.h> or not. And if your compiler didn't warn you about this error, you need to find out how to ask it for better diagnostics (we can't help with that without knowing which compiler you're using).
Expressions i = 0 and i = 1 in printf function will be evaluated to 0 and 1 (and i will be initialized to 0 and 1) respectively. So above printf statements after their expression evaluation will be equivalent to
printf(0); // To be clear, the `0` here is not a integer constant expression.
and
printf(1);
respectively.
0 and 1 both will be treated as address in printf statements and it will try to fetch string from these addresses. But, both 0 and 1 are unallocated memory addresses and accessing them will result in undefined behavior.
printf requires a const char * for input, whereas you're giving it an int
printf awaits a format string as its first argument. As strings (char*) are nothing else than pointers, you provide an address as the first parameter.
After an address and an integer are implicitly converted into each other,
printf(i = 0)
is (from the perspective of printf) the same as:
printf( 0 )
which is the same as
printf( NULL ) // no string given at all
If you provide 0 as parameter, printf will gracefully handle that case and don't do anything at all, because 0 is a reserved address meaning nothing at all.
But printf( 1 ) is different: printf now searches for a string at address 1, which is not a valid address for your program and so your program throws a segmentation fault.
[update]
The main reason why this works is a combination of several facts you need to know about how char arrays, pointers, assignments and printf itself work:
Pointers are implicitly convertible to int and vice versa, so the int value 17 for example gets converted to the memory address 0x00000011 (17) without any further notice. This is due to the fact that C is very close to the hardware and allows you to calculate with memory addresses. In fact, internally an int (or more specific: one special type of an integer) and a pointer are the same, just with a different syntax. This leads to a lot of problems when porting code from 32bit to 64bit-architecture, but this is another topic.
Assignments are different from comparations: i = 0 and i == 0 are totally different in meaning. i == 0 returns true when i contains the value 0 while i = 0 returns... 0. Why is that? You can concatenate for example the following variable assignments:
a = b = 3;
At first, b = 3 is executed, returning 3 again, which is then assigned to a. This comes in handy for cases like testing if there is more items in an array:
while( (data = get_Item()) )
This now leads to the next point:
NULL is the same as 0 (which also is the same as false). For the fact that integers and pointers can be converted into each other (or better: are physically the same thing, see my remarks above), there is a reserved value called NULL which means: pointing nowhere. This value is defined to be the virtual address 0x00000000 (32bit). So providing a 0 to a function that expects a pointer, you provide an empty pointer, if you like.
char* is a pointer. Same as char[] (there is a slight logical difference in the behaviour and in the allocation, but internally they are basically the same). printf expects a char*, but you provide an int. This is not a problem, as described above, int will get interpreted as an address. Nearly every (well written) function taking pointers as parameters will do a sanity check on these parameters. printf is no exclusion: If it detects an empty pointer, it will handle that case and return silently. However, when providing something different, there is no way to know if it is not really a valid address, so printf needs to do its job, getting interrupted by the kernel when trying to address the memory at 0x00000001 with a segmentation fault.
This was the long explanation.
And by the way: This only works for 32-bit pointers (so compiled as 32-bit binary). The same would be true for long int and pointers on 64-bit machines. However, this is a matter of the compiler, how it converts the expression you provide (normally, an int value of 0 is implicitly cast to a long int value of 0 and then used as a pointer address when assigned, but vice versa won't work without an explicit cast).
Why does this code below compiles and executes but doesn't print anything in output?
printf(i = 0);
The question embodies a false premise. On any modern C compiler, this clearly-erroneous code generates at least one error.

Difference between memory addresses of variables is constant

I ran the following code and this is the output I got:
#include <stdio.h>
int main()
{
int x = 3;
int y = x;
printf("%d\n", &x);
printf("%d\n", &y);
getchar();
return 0;
}
Output:
3078020
3078008
Now, the output changes every time I run the program, but the difference between the location of x to the location of y is always 12. I wondered why.
Edit: I understand why the difference is constant. What I don't understand is why the difference is specifically 12, and why the memory address of y, who's defined later, is less than x's.
The addresses themselves may well change due to protective measures such as address space layout randomisation (ASLR).
This is something done to mitigate the possibility of an attack vector on code. If it always loads at the same address, it's more likely that a successful attack can be made.
By moving the addresses around each time it loads, it becomes much more difficult for an attack to work everywhere (or anywhere, really).
However, the fact that both variables will be on the stack in a single stack frame (assuming the implementation even uses a stack which is by no means guaranteed), the difference between them will be very unlikely to change.
In your code,
printf("%d\n", &x);
is not the correct way to print a pointer value. What you need is
printf("%p\n", (void *)&x);
That said, the difference between the addresses is constant because those two variables are usually placed in successive memory locations in stack. However, AFAIK this is not guranteed in c standard. Only thing guraneted is sizeof(a) [size of a datatype] will be fixed for a particular platform.
To print the address of a variable you have to use the %p not %d.
To print the integer value only you have to give %d.
printf("%d\n",x);// It will print the value of x
printf("%p\n",(void*)&x);// It will print the address of x.
From the man page of printf
p The void * pointer argument is printed in hexadecimal

How is memory allocated in an array of integer pointers?

I was practicing C programming and trying to create a 2D array with fixed rows, but variable columns. So, I used "array of pointers" concept i.e. I created an array such as int* b[4].
This is the code which was written:
#include <stdio.h>
int main(void) {
int* b[4];
int c[]={1,2,3};
int d[]={4,5,6,7,8, 9};
int e[]={10};
int f[]={11, 12, 13};
b[0]=c;
b[1]=d;
b[2]=e;
b[3]=f;
//printing b[0][0] to b[0][2] i.e. c[0] to c[2]
printf("b[0][0]= %d\tb[0][1]=%d\tb[0][2]=%d\n", b[0][0], b[0][1], b[0][2]);
//printing b[1][0] to b[1][5] i.e. d[0] to d[5]
printf("b[1][0]= %d\tb[1][1]=%d\tb[1][2]=%d\tb[1][3]=%d\tb[1][4]=%d\tb[1][5]=%d\n", b[1][0], b[1][1], b[1][2], b[1][3], b[1][4], b[1][5]);
//printing b[2][0] i.e. e[0]
printf("b[2][0]= %d\n", b[2][0]);
//printing b[3][0] to b[3][2] i.e. f[0] to f[2]
printf("b[3][0]= %d\tb[3][1]=%d\tb[3][2]=%d\n", b[3][0], b[3][1], b[3][2]);
return 0;
}
and the output was as expected:
b[0][0]= 1 b[0][1]=2 b[0][2]=3
b[1][0]= 4 b[1][1]=5 b[1][2]=6 b[1][3]=7 b[1][4]=8 b[1][5]=9
b[2][0]= 10
b[3][0]= 11 b[3][1]=12 b[3][2]=13
So, I think memory has been allocated this way:
But, question chimes in when this code is executed:
#include <stdio.h>
int main(void) {
int* b[4];
int c[]={1,2,3};
int d[]={4,5,6,7,8, 9};
int e[]={10};
int f[]={11, 12, 13};
b[0]=c;
b[1]=d;
b[2]=e;
b[3]=f;
int i, j;
for (i=0; i<4; i++)
{
for (j=0; j<7; j++)
{
printf("b[%d][%d]= %d ", i, j, b[i][j]);
}
printf("\n");
}
return 0;
}
And the output is something unusual:
b[0][0]= 1 b[0][1]= 2 b[0][2]= 3 b[0][3]= 11 b[0][4]= 12 b[0][5]= 13 b[0][6]= -1079668976
b[1][0]= 4 b[1][1]= 5 b[1][2]= 6 b[1][3]= 7 b[1][4]= 8 b[1][5]= 9 b[1][6]= -1216782128
b[2][0]= 10 b[2][1]= 1 b[2][2]= 2 b[2][3]= 3 b[2][4]= 11 b[2][5]= 12 b[2][6]= 13
b[3][0]= 11 b[3][1]= 12 b[3][2]= 13 b[3][3]= -1079668976 b[3][4]= -1079668936 b[3][5]= -1079668980 b[3][6]= -1079668964
One can observe that b[0][i] continues seeking values from b[3][i], array b[2][i] continues seeking values from b[0][i] followed by a[3][i], array b[3][i] and b1[i] terminate.
Every time when this program is executed, the same pattern is followed. So, is there something more on the way memory is allocated, or is this a mere co-incidence?
As Hrishi notes in comments, the reason this is happening is that you're trying to access beyond the end of your arrays. So what's actually happening?
The short version is that you're reading past the end of your arrays, and reading into the next array (Or into unallocated memory). But why is this happening?
A brief aside on C-style arrays
In C, arrays are just pointers1. b is a pointer to the start of the array, so *b will return the first element of the array (Which in this case is a pointer to the start of b[0].
The syntax b[i] is just syntactic sugar; it's the same as *(b + i), which is doing pointer arithmetic. It's literally saying: "The memory address i places after b; tell me what's pointing there"2.
So if we look at, for example, b[0][3], we can translate that into *((*b) + 3): you're getting the address of the start of b, and then getting whatever is stored three memory address away from that.
So what's happening to you?
As it happens, your computer has stored b[3] starting at that address. That's what this is really telling you: where your computer is placing each sub-array in memory. This is because arrays are always laid out contiguously, one position right after another in memory (That's how the pointer arithmetic trick works). But because you defined c, d, e, and f individually, the memory manager did not allocate them contiguous to one another, but instead just put them wherever it wanted. The resulting pattern is just what it came up with. As best I can tell, your arrays are laid out in memory like this:
--------
| e[0] |
--------
| c[0] |
--------
| c[1] |
--------
| c[2] |
--------
| f[0] |
--------
| f[1] |
--------
| f[2] |
--------
d is located somewhere in memory as well, but it could be before or after this contiguous block; we don't know.
However you can't rely on this. As I mention in a footnote, the ordering of allocated memory is not defined by the language, so it could (And does) change depending on any number of factors. Run this same code tomorrow, and it probably won't be exactly the same.
The next obvious question is: "What about b[0][6]? Why is that such a weird number?"
The answer is that you've run out of array, and you're now trying to read from unallocated memory.
When your program gets run, the operating system gives it a certain chunk of memory and says "Here, do with that whatever you like." When you declare a local variable on the stack (As you have here) or on the heap (With malloc), the memory manager grabs some of that memory and gives it back to you4. All the memory you're not currently using is still there, but you have no idea what is stored there; it's just leftover data from whatever was last using that particular chunk of memory. Reading this is also undefined behaviour in C, because you obviously have no control over what is stored in that memory.
I should note that most other languages (Java, for instance) wouldn't allow you to do anything like this; it would throw an exception because you're trying to access beyond the bounds of an array. C, however, isn't that smart. C likes to give you enough rope to hang yourself, so you need to do your own bounds checking.
1 This is a simplification. The truth is slightly more complicated
2 This implementation is why array indices start at 0.
3 This is an example of undefined behaviour, which is Very Bad. Basically it means that this result isn't consistent. It's happening the same way every time, on your computer, right now. Try it on a friend's computer, or even on your computer an hour from now, and you might get something completely different.
4 This is another oversimplification, but for your purposes it's close enough to true
Your little drawing is right, the only thing is that since you sequentially declared the arrays in your function, they're all in the stack, side by side. So, by accessing beyond your arrays' limits you're accessing the next array.
Compile with all warnings & debug info (gcc -Wall -Wextra -g). Then use the debugger (gdb). Beware of undefined behavior (UB).
Your b[2] is e which is an array of one element. At some time you are accessing b[2][3]. This is a buffer overflow (an instance of UB). What really happens is implementation specific (can vary with the compiler, its version, the ABI, the processor, the kernel, the moon, the compiler flags, ...) You may want to study the assembled code to understand more (gcc -fverbose-asm -S).
BTW, you should not suppose that arrays c, d, e, f have some particular memory layout.
When you print the values of the array elements using the specific location addresses you get the exact array values . but when you execute the same program you get garbage values as in c language we have no bond checking in C.Thus when you try accessing the value of a location that is beyond the memory scope utilized by you all you get is the data stored on that memory which is also referred as garbage value . So to get proper result you need to keep a check that accesses the values that are in array bond or say in the limit defined for that array.

How is this loop ending and are the results deterministic?

I found some code and I am baffled as to how the loop exits, and how it works. Does the program produce a deterministic output?
The reason I am baffled is:
1. `someArray` is of size 2, but clearly, the loop goes till size 3,
2. The value is deterministic and it always exits `someNumber` reaches 4
Can someone please explain how this is happening?
The code was not printing correctly when I put angle brackets <> around include's library names.
#include <stdlib.h>
#include <time.h>
#include <stdio.h>
int main() {
int someNumber = 97;
int someArray[2] = {0,1};
int findTheValue;
for (findTheValue=0; (someNumber -= someArray[findTheValue]) >0; findTheValue++) {
}
printf("The crazy value is %d", findTheValue);
return EXIT_SUCCESS;
}
Accessing an array element beyond its bounds is undefined behavior. That is, the program is allowed to do anything it pleases, reply 42, eat your hard disk or spend all your money. Said in other words what is happening in such cases is entirely platform dependent. It may look "deterministic" but this is just because you are lucky, and also probably because you are only reading from that place and not writing to it.
This kind of code is just bad. Don't do that.
Depending on your compiler, someArray[2] is a pointer to findTheValue!
Because these variables are declared one-after-another, it's entirely possible that they would be positioned consecutively in memory (I believe on the stack). C doesn't really do any memory management or errorchecking, so someArray[2] just means the memory at someArray[0] + 2 * sizeof(int).
So when findTheValue is 0, we subtract, then when findTheValue is 1, we subtract 1. When findTheValue is 2, we subtract someNumber (which is now 94) and exit.
This behavior is by no means guaranteed. Don't rely on it!
EDIT: It is probably more likely that someArray[2] just points to garbage (unspecified) values in your RAM. These values are likely more than 93 and will cause the loop to exit.
EDIT2: Or maybe someArray[2] and someArray[3] are large negative numbers, and subtracting both causes someNumber to roll over to negative.
The loop exits because (someNumber -= someArray[findTheValue]) doesnt set.
Adding a debug line, you can see
value 0 number 97 array 0
value 1 number 96 array 1
value 2 number 1208148276 array -1208148180
that is printing out findTheValue, someNumber, someArray[findTheValue]
Its not the answer I would have expected at first glance.
Checking addresses:
printf("&someNumber = %p\n", &someNumber);
printf("&someArray[0] = %p\n", &someArray[0]);
printf("&someArray[1] = %p\n", &someArray[1]);
printf("&findTheValue = %p\n", &findTheValue);
gave this output:
&someNumber = 0xbfc78e5c
&someArray[0] = 0xbfc78e50
&someArray[1] = 0xbfc78e54
&findTheValue = 0xbfc78e58
It seems that for some reason the compiler puts the array in the beginning of the stack area, then the variables that are declared below and then those that are above in the order they are declared. So someArray[3] effectively points at someNumber.
I really do not know the reason, but I tried gcc on Ubuntu 32 bit and Visual Studio with and without optimisation and the results were always similar.

Resources