How is memory allocated in an array of integer pointers? - c

I was practicing C programming and trying to create a 2D array with fixed rows, but variable columns. So, I used "array of pointers" concept i.e. I created an array such as int* b[4].
This is the code which was written:
#include <stdio.h>
int main(void) {
int* b[4];
int c[]={1,2,3};
int d[]={4,5,6,7,8, 9};
int e[]={10};
int f[]={11, 12, 13};
b[0]=c;
b[1]=d;
b[2]=e;
b[3]=f;
//printing b[0][0] to b[0][2] i.e. c[0] to c[2]
printf("b[0][0]= %d\tb[0][1]=%d\tb[0][2]=%d\n", b[0][0], b[0][1], b[0][2]);
//printing b[1][0] to b[1][5] i.e. d[0] to d[5]
printf("b[1][0]= %d\tb[1][1]=%d\tb[1][2]=%d\tb[1][3]=%d\tb[1][4]=%d\tb[1][5]=%d\n", b[1][0], b[1][1], b[1][2], b[1][3], b[1][4], b[1][5]);
//printing b[2][0] i.e. e[0]
printf("b[2][0]= %d\n", b[2][0]);
//printing b[3][0] to b[3][2] i.e. f[0] to f[2]
printf("b[3][0]= %d\tb[3][1]=%d\tb[3][2]=%d\n", b[3][0], b[3][1], b[3][2]);
return 0;
}
and the output was as expected:
b[0][0]= 1 b[0][1]=2 b[0][2]=3
b[1][0]= 4 b[1][1]=5 b[1][2]=6 b[1][3]=7 b[1][4]=8 b[1][5]=9
b[2][0]= 10
b[3][0]= 11 b[3][1]=12 b[3][2]=13
So, I think memory has been allocated this way:
But, question chimes in when this code is executed:
#include <stdio.h>
int main(void) {
int* b[4];
int c[]={1,2,3};
int d[]={4,5,6,7,8, 9};
int e[]={10};
int f[]={11, 12, 13};
b[0]=c;
b[1]=d;
b[2]=e;
b[3]=f;
int i, j;
for (i=0; i<4; i++)
{
for (j=0; j<7; j++)
{
printf("b[%d][%d]= %d ", i, j, b[i][j]);
}
printf("\n");
}
return 0;
}
And the output is something unusual:
b[0][0]= 1 b[0][1]= 2 b[0][2]= 3 b[0][3]= 11 b[0][4]= 12 b[0][5]= 13 b[0][6]= -1079668976
b[1][0]= 4 b[1][1]= 5 b[1][2]= 6 b[1][3]= 7 b[1][4]= 8 b[1][5]= 9 b[1][6]= -1216782128
b[2][0]= 10 b[2][1]= 1 b[2][2]= 2 b[2][3]= 3 b[2][4]= 11 b[2][5]= 12 b[2][6]= 13
b[3][0]= 11 b[3][1]= 12 b[3][2]= 13 b[3][3]= -1079668976 b[3][4]= -1079668936 b[3][5]= -1079668980 b[3][6]= -1079668964
One can observe that b[0][i] continues seeking values from b[3][i], array b[2][i] continues seeking values from b[0][i] followed by a[3][i], array b[3][i] and b1[i] terminate.
Every time when this program is executed, the same pattern is followed. So, is there something more on the way memory is allocated, or is this a mere co-incidence?

As Hrishi notes in comments, the reason this is happening is that you're trying to access beyond the end of your arrays. So what's actually happening?
The short version is that you're reading past the end of your arrays, and reading into the next array (Or into unallocated memory). But why is this happening?
A brief aside on C-style arrays
In C, arrays are just pointers1. b is a pointer to the start of the array, so *b will return the first element of the array (Which in this case is a pointer to the start of b[0].
The syntax b[i] is just syntactic sugar; it's the same as *(b + i), which is doing pointer arithmetic. It's literally saying: "The memory address i places after b; tell me what's pointing there"2.
So if we look at, for example, b[0][3], we can translate that into *((*b) + 3): you're getting the address of the start of b, and then getting whatever is stored three memory address away from that.
So what's happening to you?
As it happens, your computer has stored b[3] starting at that address. That's what this is really telling you: where your computer is placing each sub-array in memory. This is because arrays are always laid out contiguously, one position right after another in memory (That's how the pointer arithmetic trick works). But because you defined c, d, e, and f individually, the memory manager did not allocate them contiguous to one another, but instead just put them wherever it wanted. The resulting pattern is just what it came up with. As best I can tell, your arrays are laid out in memory like this:
--------
| e[0] |
--------
| c[0] |
--------
| c[1] |
--------
| c[2] |
--------
| f[0] |
--------
| f[1] |
--------
| f[2] |
--------
d is located somewhere in memory as well, but it could be before or after this contiguous block; we don't know.
However you can't rely on this. As I mention in a footnote, the ordering of allocated memory is not defined by the language, so it could (And does) change depending on any number of factors. Run this same code tomorrow, and it probably won't be exactly the same.
The next obvious question is: "What about b[0][6]? Why is that such a weird number?"
The answer is that you've run out of array, and you're now trying to read from unallocated memory.
When your program gets run, the operating system gives it a certain chunk of memory and says "Here, do with that whatever you like." When you declare a local variable on the stack (As you have here) or on the heap (With malloc), the memory manager grabs some of that memory and gives it back to you4. All the memory you're not currently using is still there, but you have no idea what is stored there; it's just leftover data from whatever was last using that particular chunk of memory. Reading this is also undefined behaviour in C, because you obviously have no control over what is stored in that memory.
I should note that most other languages (Java, for instance) wouldn't allow you to do anything like this; it would throw an exception because you're trying to access beyond the bounds of an array. C, however, isn't that smart. C likes to give you enough rope to hang yourself, so you need to do your own bounds checking.
1 This is a simplification. The truth is slightly more complicated
2 This implementation is why array indices start at 0.
3 This is an example of undefined behaviour, which is Very Bad. Basically it means that this result isn't consistent. It's happening the same way every time, on your computer, right now. Try it on a friend's computer, or even on your computer an hour from now, and you might get something completely different.
4 This is another oversimplification, but for your purposes it's close enough to true

Your little drawing is right, the only thing is that since you sequentially declared the arrays in your function, they're all in the stack, side by side. So, by accessing beyond your arrays' limits you're accessing the next array.

Compile with all warnings & debug info (gcc -Wall -Wextra -g). Then use the debugger (gdb). Beware of undefined behavior (UB).
Your b[2] is e which is an array of one element. At some time you are accessing b[2][3]. This is a buffer overflow (an instance of UB). What really happens is implementation specific (can vary with the compiler, its version, the ABI, the processor, the kernel, the moon, the compiler flags, ...) You may want to study the assembled code to understand more (gcc -fverbose-asm -S).
BTW, you should not suppose that arrays c, d, e, f have some particular memory layout.

When you print the values of the array elements using the specific location addresses you get the exact array values . but when you execute the same program you get garbage values as in c language we have no bond checking in C.Thus when you try accessing the value of a location that is beyond the memory scope utilized by you all you get is the data stored on that memory which is also referred as garbage value . So to get proper result you need to keep a check that accesses the values that are in array bond or say in the limit defined for that array.

Related

expected output and the practical output not matching, please explain the logic behind the code

#include<stdio.h>
int main()
{
char *str[] = {"Frogs","Do","Not","Die.","They","Croak"};
printf("%c %c %c",*str[0],*str[1],*str[2]);//expected F D N
printf("\n%u %u %u",str[0],str[1],str[2]);//expected 1000 1006 1003
}
this output is based on the assumption that froak begins at 1000
the output is as follows
F D N
2162395060 2162395057 2162395053
how can that be possible, here the address is decreasing for str[0] to str[2], printing the address of str[3], str[4], str[5], shows no pattern and rather have abrupt changes in the addresses
You are printing the addresses of three string constants. The compiler is under no obligation to organize the string constants in any predictable fashion.
The compiler is required to provide an array of pointers. The array can be accessed sequentially to obtain addresses of the string constants, but the string constants may be stored in any location which the compiler deems efficient or useful.
I ran the same code on mac OS using AppleClang 10.0.0.10001044 and got the following output:
F D N
104431486 104431492 104431495
As you can see, the pointers are sequential using AppleClang.
However, that is irrelevant. Nothing in your code should depend on how the compiler chooses to allocate memory for the string constants.

Why am I not getting a segfault error with this simple code?

I have to show an error when I access an item outside of an array (without creating my own function for it). So I just thought it was necessary to access the value out of the array to trigger a segfault but this code does not crash at all:
int main(){
int tab[4];
printf("%d", tab[7]);
}
Why I can't get an error when I'm doing this?
When you invoke undefined behavior, anything can happen. You program may crash, it may display strange results, or it may appear to work properly.
Also, making a seemingly unrelated change such as adding an unused local variable or a simple call to printf can change the way in which undefined behavior manifests itself.
When I ran this program, it completed and printed 63. If I changed the referenced index from 7 to 7000, I get a segfault.
In short, just because the program can crash doesn't mean it will.
Because the behavior when you do things not allowed by the spec is "undefined". And because there are no bounds checks required in C. You got "lucky".
int tab[4]; says to allocate memory for 4 integers on the stack. tab is just a number of a memory address. It doesn't know anything about what it's pointing at or how much space as been allocated.
printf("%d", tab[7]); says to print out the 8th element of tab. So the compiler does...
tab is set to 1000 (for example) meaning memory address 1000.
tab represents a list of int, so each element will be sizeof(int), probably 4 or 8 bytes. Let's say 8.
Therefore tab[7] means to start reading at memory position (7 * 8) + 1000 = 1056 and for 8 more bytes. So it reads 1056 to 1063.
That's it. No bounds checks by the program itself. The hardware or OS might do a bounds check to prevent one process from reading arbitrary memory, have a look into protected memory, but nothing required by C.
So tab[7] faithfully reproduces whatever garbage is in 1056 to 1063.
You can write a little program to see this.
int main(){
int tab[4];
printf("sizeof(int): %zu\n", sizeof(int));
printf("tab: %d\n", tab);
printf("&tab[7]: %d\n", &tab[7]);
/* Note: tab must be cast to an integer else C will do pointer
math on it. `7 + tab` gives the same result. */
printf("(7 * sizeof(int)) + (int)tab: %d\n", (7 * sizeof(int)) + (int)tab);
printf("7 + tab: %d\n", 7 + tab);
}
The exact results will vary, but you'll see that &tab[7] is just some math done on tab to figure out what memory address to examine.
$ ./test
sizeof(int): 4
tab: 1595446448
&tab[7]: 1595446476
(7 * sizeof(int)) + (int)tab: 1595446476
7 + tab: 1595446476
1595446476 - 1595446448 is 28. 7 * 4 is 28.
An array in C is just a pointer to a block of memory with a starting point at, in this case, the arbitrary location of tab[0]. Sure you've set a bound of 4 but if you go past that, you just accessing random values that are past that block of memory. (i.e. the reason it is probably printing out weird numbers).

why my code has not a limit?

That is my code:
#include<stdio.h>
int main()
{
int vet[10], i;
for(i=30; i<=45; i++)
{
scanf("%d", &vet[i]);
}
for(i=30; i<=45; i++)
printf(" %d ", vet[i]);
for(i=30; i<=45; i++)
printf(" %x", &vet[i]);
return 0;
}
I declared just 10 positions of int type on memory, but i get more, so what happened ?
it is a memory overflow ?
and the type %x is correctly to print the memory adress ?
the imput was:
1
2
3
4
5
6
7
8
9
10 /*It was to be stoped right here !?*/
11
12
13
14
15
16
and returned:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 /*I put space to indent*/
22ff6c 22ff70 22ff74 22ff78 22ff7c 22ff80 22ff84 22ff88 22ff8c 22ff90 22ff94 22ff98 22ff9c 22ffa0 22ffa4 22ffa8
The C language does not check bounds when you access arrays for reading or writing. It is up to the program author to ensure that the program accesses only valid array elements.
In this case, you wrote values to memory addresses outside your declared array. While you may sometimes get a segmentation violation (SIGSEGV) in this case, you may just get "lucky" -- really, unlucky -- and not encounter any problems at runtime.
C doesn't enforce array boundaries. Keeping within the limits is your responsibility in that language - it will let you do plainly wrong things, but it may crash at runtime.
Not only does the C language not check bounds on array accesses with respect to array size, which explains why you are successfully writing to the array 15 times, but C also does not have a mechanism for converting your range of 30 to 45 into the range of the first 10 (or 15?) elements of the array.
So, you are really attempting to write to the 31st through 46th element of the array vet, which has only 10 elements.
C is perfectly happy to let you read from and write to an array past the bounds you set (10, in this case).
Reading past the limit just gives you garbage; writing past it will do all kinds of crazy things and generally crash your program (or, if you are unlucky, overwrite your entire hard drive).
You were lucky with this program, but you should not keep doing that. In C, you are responsible for enforcing the limits of your arrays yourself.
int vet[10] declares a block of ten integers in memory. These memory locations are accessed via vet[0] through vet[9]. Any other access to memory through vet is undefined behavior. Absolutely anything could be within that memory, and you can easily corrupt the rest of your program execution. The compiler trusts you to know better than what you were doing.
As #NigelHarper correctly points out, %p is the official way of printing pointers. It prints in hexadecimal. Pointers could print in decimal, but the number itself is meaningless. Hexadecimal makes the printing more concise, and just as easy to see differences from one address to the next.
It is also possible to use %x for printing a pointer, since all that does is take a value and print it in hexadecimal form.
C does not do bounds checking on arrays and you are accessing an array out of bounds. The possible valid indexes in the array are [0,9], but you are accessing [30,45].
You should modify your code to only access valid indexes:
int SIZE = 10;
int vet[SIZE];
//...
// not for( i = 30; i <= 45; i++ )
for( i = 0; i < SIZE; ++i ) { /* ... */ }
C Language doesn't have support to check the out of bound array accesses. IN c++, if you try to access out of bound array memory location, it will generate Segmentation Fault which causes your process to terminate. As, C doesn't allow it, it is expected behavior.

C memory management in gcc

I am using gcc version 4.7.2 on Ubuntu 12.10 x86_64.
First of all these are the sizes of data types on my terminal:
sizeof(char) = 1
sizeof(short) = 2 sizeof(int) = 4
sizeof(long) = 8 sizeof(long long) = 8
sizeof(float) = 4 sizeof(double) = 8
sizeof(long double) = 16
Now please have a look at this code snippet:
int main(void)
{
char c = 'a';
printf("&c = %p\n", &c);
return 0;
}
If I am not wrong we can't predict anything about the address of c. But each time this program gives some random hex address ending in f. So the next available location will be some hex value ending in 0.
I observed this pattern in case of other data types too. For an int value the address was some hex value ending in c. For double it was some random hex value ending in 8 and so on.
So I have 2 questions here.
1) Who is governing this kind of memory allocation ? Is it gcc or C standard ?
2) Whoever it is, Why it's so ? Why the variable is stored in such a way that next available memory location starts at a hex value ending in 0 ? Any specific benefit ?
Now please have a look at this code snippet:
int main(void)
{
double a = 10.2;
int b = 20;
char c = 30;
short d = 40;
printf("&a = %p\n", &a);
printf("&b = %p\n", &b);
printf("&c = %p\n", &c);
printf("&d = %p\n", &d);
return 0;
}
Now here what I observed is completely new for me. I thought the variable would get stored in the same order they are declared. But No! That's not the case. Here is the sample output of one of random run:
&a = 0x7fff8686a698
&b = 0x7fff8686a694
&c = 0x7fff8686a691
&d = 0x7fff8686a692
It seems that variables get sorted in increasing order of their sizes and then they are stored in the same sorted order but with maintaining the observation 1. i.e. the last variable (largest one) gets stored in such a way that the next available memory location is an hex value ending in 0.
Here are my questions:
3) Who is behind this ? Is it gcc or C standard ?
4) Why to waste the time in sorting the variables first and then allocating the memory instead of directly allocating the memory on 'first come first serve' basis ? Any specific benefit of this kind of sorting and then allocating memory ?
Now please have a look at this code snippet:
int main(void)
{
char array1[] = {1, 2};
int array2[] = {1, 2, 3};
printf("&array1[0] = %p\n", &array1[0]);
printf("&array1[1] = %p\n\n", &array1[1]);
printf("&array2[0] = %p\n", &array2[0]);
printf("&array2[1] = %p\n", &array2[1]);
printf("&array2[2] = %p\n", &array2[2]);
return 0;
}
Now this is also shocking for me. What I observed is that the array is always stored at some random hex value ending in '0' if the elements of an array >= 2 and if elements < 2
then it gets memory location following observation 1.
So here are my questions:
5) Who is behind this storing an array at some random hex value ending at 0 thing ? Is it gcc or C standard ?
6) Now why to waste the memory ? I mean array2 could have been stored immediately after array1 (and hence array2 would have memory location ending at 2). But instead of that array2 is stored at next hex value ending at 0 thereby leaving 14 memory locations in between. Any specific benefits ?
The address at which the stack and the heap start is given to the process by the operating system. Everything else is decided by the compiler, using offsets that are known at compile time. Some of these things may follow an existing convention followed in your target architecture and some of these do not.
The C standard does not mandate anything regarding the order of the local variables inside the stack frame (as pointed out in a comment, it doesn't even mandate the use of a stack at all). The standard only bothers to define order when it comes to structs and, even then, it does not define specific offsets, only the fact that these offsets must be in increasing order. Usually, compilers try to align the variables in such a way that access to them takes as few CPU instructions as possible - and the standard permits that, without mandating it.
Part of the reasons are mandated by the application binary interface (ABI) specifications for your system & processor.
See the x86 calling conventions and the SVR4 x86-64 ABI supplement (I'm giving the URL of a recent copy; the latest original is surprisingly hard to find on the Web).
Within a given call frame, the compiler could place variables in arbitrary stack slots. It may try (when optimizing) to reorganize the stack at will, e.g. by decreasing alignment constraints. You should not worry about that.
A compiler try to put local variables on stack location with suitable alignment. See the alignof extension of GCC. Where exactly the compiler put these variables is not important, see my answer here. (If it is important to your code, you really should pack the variables in a single common local struct, since each compiler, version and optimization flags could do different things; so don't depend on that precise behavior of your particular compiler).

How is this loop ending and are the results deterministic?

I found some code and I am baffled as to how the loop exits, and how it works. Does the program produce a deterministic output?
The reason I am baffled is:
1. `someArray` is of size 2, but clearly, the loop goes till size 3,
2. The value is deterministic and it always exits `someNumber` reaches 4
Can someone please explain how this is happening?
The code was not printing correctly when I put angle brackets <> around include's library names.
#include <stdlib.h>
#include <time.h>
#include <stdio.h>
int main() {
int someNumber = 97;
int someArray[2] = {0,1};
int findTheValue;
for (findTheValue=0; (someNumber -= someArray[findTheValue]) >0; findTheValue++) {
}
printf("The crazy value is %d", findTheValue);
return EXIT_SUCCESS;
}
Accessing an array element beyond its bounds is undefined behavior. That is, the program is allowed to do anything it pleases, reply 42, eat your hard disk or spend all your money. Said in other words what is happening in such cases is entirely platform dependent. It may look "deterministic" but this is just because you are lucky, and also probably because you are only reading from that place and not writing to it.
This kind of code is just bad. Don't do that.
Depending on your compiler, someArray[2] is a pointer to findTheValue!
Because these variables are declared one-after-another, it's entirely possible that they would be positioned consecutively in memory (I believe on the stack). C doesn't really do any memory management or errorchecking, so someArray[2] just means the memory at someArray[0] + 2 * sizeof(int).
So when findTheValue is 0, we subtract, then when findTheValue is 1, we subtract 1. When findTheValue is 2, we subtract someNumber (which is now 94) and exit.
This behavior is by no means guaranteed. Don't rely on it!
EDIT: It is probably more likely that someArray[2] just points to garbage (unspecified) values in your RAM. These values are likely more than 93 and will cause the loop to exit.
EDIT2: Or maybe someArray[2] and someArray[3] are large negative numbers, and subtracting both causes someNumber to roll over to negative.
The loop exits because (someNumber -= someArray[findTheValue]) doesnt set.
Adding a debug line, you can see
value 0 number 97 array 0
value 1 number 96 array 1
value 2 number 1208148276 array -1208148180
that is printing out findTheValue, someNumber, someArray[findTheValue]
Its not the answer I would have expected at first glance.
Checking addresses:
printf("&someNumber = %p\n", &someNumber);
printf("&someArray[0] = %p\n", &someArray[0]);
printf("&someArray[1] = %p\n", &someArray[1]);
printf("&findTheValue = %p\n", &findTheValue);
gave this output:
&someNumber = 0xbfc78e5c
&someArray[0] = 0xbfc78e50
&someArray[1] = 0xbfc78e54
&findTheValue = 0xbfc78e58
It seems that for some reason the compiler puts the array in the beginning of the stack area, then the variables that are declared below and then those that are above in the order they are declared. So someArray[3] effectively points at someNumber.
I really do not know the reason, but I tried gcc on Ubuntu 32 bit and Visual Studio with and without optimisation and the results were always similar.

Resources