Pointer arithmetic on "neighbour" declared variables - c

Could someone explain why the following code behaves differently if the 3 first printf calls are there or not ?
#include <stdlib.h>
#include <stdio.h>
int main(){
int a = 1;
int b = 2;
int c = 3;
int* a_ptr = &a;
// comment out the following 3 lines
printf(" &a : %p\n", &a);
printf(" &b : %p\n", &b);
printf(" &c : %p\n", &c);
printf("*(a_ptr) : %d\n", *(a_ptr));
printf("*(a_ptr-1) : %d\n", *(a_ptr-1));
printf("*(a_ptr-2) : %d\n", *(a_ptr-2));
I was playing around to learn how variables are being stacked in memory depending on the order of declaration. By printing the addresses, I see they are 1 int size apart if they are declared after each other. After printing the addreses, I just subtract 1 and 2 from the address of a and dereference it. When I print it, it shows what i'd expect, the values of a, b and c. BUT, if i do not print the addresses before, I just get a=1, while b and c seem random to me. What is going on here ? Do you get the same behaviour on your machine ?
// without printing the addresses
*(a_ptr) : 1
*(a_ptr-1) : 4200880
*(a_ptr-2) : 6422400
// with printing
&a : 0061FF18
&b : 0061FF14
&c : 0061FF10
*(a_ptr) : 1
*(a_ptr-1) : 2
*(a_ptr-2) : 3

The C standard does not define what happens in the code you show, notably because the address arithmetic a_ptr-1 and a_ptr-2 is not defined. For an array of n elements, pointer arithmetic is defined only for adjusting pointers to locations from that corresponding to index 0 to that corresponding to index n (which is one beyond the last element of the array). For a single object, pointer arithmetic is defined as if it were an array of one element, so only for adjusting a pointer to the locations corresponding to index 0 and index 1 (just beyond the object). a_ptr-1 and a_ptr-2 would point to elements at indices −1 and −2, and the C standard does not define the behavior for these.
However, what is happening in the experiments you tried is:
When the addresses of a, b, and c are printed, the compiler has to ensure there are addresses for it to print. So it assigns memory locations to a, b, and c. It happens that these are consecutive in memory and in reverse order, and therefore a_ptr-1 happened to point to b and a_ptr-2 happened to point to c. (However, compilers may assign locations in memory based on alphabetical order of the names rather than declaration order, based on alignment and size and other properties, based on some arbitrary hash it uses to organize the names in a data structure, and/or other factors. In particular, if you compiled without optimization, and you request high optimization instead, the order may change.)
When the addresses are not printed, the compiler has no need to assign memory locations for b and c because they are not used in the code. In this case, a_ptr-1 and a_ptr-2 do not point to b and c but point to locations in memory that have been used for other purposes.

Related

number of elements exceed the size declared in array in c

int main(void)
{
int b[5] = {1,2,3,4,5,6,7,8,9,10};
int i;
for(i=0;i<9;i++)
{
printf(" %d ",b[i]);
}
printf("\n address : %u",b);
return 0;
}
This is C program in which number of elements exceeds declared size. when i iterate through array it prints following output-
1 2 3 4 5 5 6422284 3854336 6422352
address : 6422288
compiler - gcc
i don't understand
why 5 is printed twice
why only 9 values are printed instead of 10
Because the behavior of the program is not defined, and thus it will do whatever. Your expectation that “5 should not be printed twice” is just as speculative as any other expectation. Why do you think 5 should not be printed twice?!
Because the behavior of the program is not defined. Thus your expectation as to the behavior of the loop is just idle speculation.
You’ll see even more interesting things happen if you do a release (optimized) build. It may do nothing whatsoever, or crash, or do the same thing, or do something else out of the blue if the CPU’s IP gets corrupted or sent into the “blue”.
And by the way, your original program could too :)
The C language specification contains the following constraint (C17 6.7.9/2):
No initializer shall attempt to provide a value for an object not
contained within the entity being initialized.
Your initialization of b ...
int b[5] = {1,2,3,4,5,6,7,8,9,10};
... violates that constraint by including ten values in b's initializer list when b has only five elements, all scalars. A conforming compiler is obligated to emit a diagnostic about that, and if gcc does not do so by default then it probably can be made to do so by including the -pedantic flag. Generally speaking, with gcc you should use both -Wall and -pedantic at least until you know enough to make an informed decision of your own.
Technically, the behavior of your program is undefined as a result, but that's probably not the main issue in practice. I expect that gcc is just ignoring the excess initializer values. The main issue is that no matter how many initializer elements you provide, you have specified that b has exactly 5 elements, but you attempt to access elements at indices above 4. Those are out of bounds array accesses, and their behavior is undefined. You have no valid reason to expect any particular results of those, or that anything be printed at all.
In fact, the whole program has undefined behavior as a result, so it is not safe to assume that a program that performed all the same things except the out-of-bounds accesses would produce a subset of the output of the erroneous program.
With respect to your particular questions:
why 5 is printed twice
The language specification does not say. The result of attempting to print b[5] is undefined.
why only 9 values are printed instead of 10
Probably because of the iteration bounds, i=0;i<9;i++. With those, and if the program otherwise conformed to the language specification, one would expect the loop body to be executed nine times (in particular, not when i has the value 9).
Note also that this ...
printf("\n address : %u",b);
... is wrong. The printf directive for printing pointers is %p, and, technically, it requires the argument to be a pointer to void (not, for example, a pointer to int). This would be fully correct:
printf("\n address : %p", (void *) b);
Addendum
If you want an array with exactly ten elements, then declare it so:
int b[10] /* initializer optional */;
If you want one whose size is chosen based on the number of elements in its initializer, then omit the explicit size:
int b[] = {1,2,3,4,5,6,7,8,9,10}; /* declares a 10-element array */

expected output and the practical output not matching, please explain the logic behind the code

#include<stdio.h>
int main()
{
char *str[] = {"Frogs","Do","Not","Die.","They","Croak"};
printf("%c %c %c",*str[0],*str[1],*str[2]);//expected F D N
printf("\n%u %u %u",str[0],str[1],str[2]);//expected 1000 1006 1003
}
this output is based on the assumption that froak begins at 1000
the output is as follows
F D N
2162395060 2162395057 2162395053
how can that be possible, here the address is decreasing for str[0] to str[2], printing the address of str[3], str[4], str[5], shows no pattern and rather have abrupt changes in the addresses
You are printing the addresses of three string constants. The compiler is under no obligation to organize the string constants in any predictable fashion.
The compiler is required to provide an array of pointers. The array can be accessed sequentially to obtain addresses of the string constants, but the string constants may be stored in any location which the compiler deems efficient or useful.
I ran the same code on mac OS using AppleClang 10.0.0.10001044 and got the following output:
F D N
104431486 104431492 104431495
As you can see, the pointers are sequential using AppleClang.
However, that is irrelevant. Nothing in your code should depend on how the compiler chooses to allocate memory for the string constants.

Is it possible to get the address of a macro?

#define N 100
&N not possible. How can I know the address of N? It must have some address.
A macro is a fragment of code which has been given a name. Whenever
the name is used, it is replaced by the contents of the macro.
So NO N is not a variable and you can't get the address of N
MACROS
Macros don't have any addresses and that's why they are referred to as constant values.
A macro is simple text substitution. If you write code like
int* ptr = &N;
it will then be pre-processed into
int* ptr = &100;
100 is an integer literal and all integer literals are constant rvalues, meaning you can't assign a value to it, nor can you take its address.
The literal 100 is of course stored in memory though - you can't allocate numbers in thin air - but it will most likely be stored as part of the executed program code. In the binary, you'll have some machine code instruction looking like "store 100 in the memory location of the pointer variable" and the 100 is stored in that very machine code instruction. And a part of a machine code instruction isn't addressable.
It is simple text substitution. The compiler's pre-processor replaces all occurrences of N with 100, but, it depends on what N is. In your example taking the address of a constant won't compile, but these two other examples do work.
#include <stdio.h>
#define N 100
#define M x
#define L "Hallo world!"
int main()
{
int x = 42;
//printf ("Address of 'N' is %p\n", (void*)&N); // error C2101: '&' on constant
printf ("Address of 'M' is %p\n", (void*)&M);
printf ("Address of 'L' is %p\n", (void*)&L);
return 0;
}
Program output:
Address of 'M' is 0018FF3C
Address of 'L' is 0040C018
MORE explanation.
With #define N 100 you can't get the address of N because a numerical constant like that does not have a single memory location. 100 might be assigned to the value of a variable, or indeed the optimising compiler might load 100 directly into a processor register.
In the case of #define M x that's a simple substitution so that M can be used exactly as x can. There is no functional difference between &x and &M because the two statements are identical after the preprocessor has made the substitution.
In the case of #define L "Hallo world!" we have a string literal, which the compiler does place in memory. Asking for &L is the same as asking for &"Hallo world!" and that is what you get.
N is not a variable, it never has any address. It's just a value to get pasted in when you use patterns like int val=N;. Then you can get the address of val using &.

C memory management in gcc

I am using gcc version 4.7.2 on Ubuntu 12.10 x86_64.
First of all these are the sizes of data types on my terminal:
sizeof(char) = 1
sizeof(short) = 2 sizeof(int) = 4
sizeof(long) = 8 sizeof(long long) = 8
sizeof(float) = 4 sizeof(double) = 8
sizeof(long double) = 16
Now please have a look at this code snippet:
int main(void)
{
char c = 'a';
printf("&c = %p\n", &c);
return 0;
}
If I am not wrong we can't predict anything about the address of c. But each time this program gives some random hex address ending in f. So the next available location will be some hex value ending in 0.
I observed this pattern in case of other data types too. For an int value the address was some hex value ending in c. For double it was some random hex value ending in 8 and so on.
So I have 2 questions here.
1) Who is governing this kind of memory allocation ? Is it gcc or C standard ?
2) Whoever it is, Why it's so ? Why the variable is stored in such a way that next available memory location starts at a hex value ending in 0 ? Any specific benefit ?
Now please have a look at this code snippet:
int main(void)
{
double a = 10.2;
int b = 20;
char c = 30;
short d = 40;
printf("&a = %p\n", &a);
printf("&b = %p\n", &b);
printf("&c = %p\n", &c);
printf("&d = %p\n", &d);
return 0;
}
Now here what I observed is completely new for me. I thought the variable would get stored in the same order they are declared. But No! That's not the case. Here is the sample output of one of random run:
&a = 0x7fff8686a698
&b = 0x7fff8686a694
&c = 0x7fff8686a691
&d = 0x7fff8686a692
It seems that variables get sorted in increasing order of their sizes and then they are stored in the same sorted order but with maintaining the observation 1. i.e. the last variable (largest one) gets stored in such a way that the next available memory location is an hex value ending in 0.
Here are my questions:
3) Who is behind this ? Is it gcc or C standard ?
4) Why to waste the time in sorting the variables first and then allocating the memory instead of directly allocating the memory on 'first come first serve' basis ? Any specific benefit of this kind of sorting and then allocating memory ?
Now please have a look at this code snippet:
int main(void)
{
char array1[] = {1, 2};
int array2[] = {1, 2, 3};
printf("&array1[0] = %p\n", &array1[0]);
printf("&array1[1] = %p\n\n", &array1[1]);
printf("&array2[0] = %p\n", &array2[0]);
printf("&array2[1] = %p\n", &array2[1]);
printf("&array2[2] = %p\n", &array2[2]);
return 0;
}
Now this is also shocking for me. What I observed is that the array is always stored at some random hex value ending in '0' if the elements of an array >= 2 and if elements < 2
then it gets memory location following observation 1.
So here are my questions:
5) Who is behind this storing an array at some random hex value ending at 0 thing ? Is it gcc or C standard ?
6) Now why to waste the memory ? I mean array2 could have been stored immediately after array1 (and hence array2 would have memory location ending at 2). But instead of that array2 is stored at next hex value ending at 0 thereby leaving 14 memory locations in between. Any specific benefits ?
The address at which the stack and the heap start is given to the process by the operating system. Everything else is decided by the compiler, using offsets that are known at compile time. Some of these things may follow an existing convention followed in your target architecture and some of these do not.
The C standard does not mandate anything regarding the order of the local variables inside the stack frame (as pointed out in a comment, it doesn't even mandate the use of a stack at all). The standard only bothers to define order when it comes to structs and, even then, it does not define specific offsets, only the fact that these offsets must be in increasing order. Usually, compilers try to align the variables in such a way that access to them takes as few CPU instructions as possible - and the standard permits that, without mandating it.
Part of the reasons are mandated by the application binary interface (ABI) specifications for your system & processor.
See the x86 calling conventions and the SVR4 x86-64 ABI supplement (I'm giving the URL of a recent copy; the latest original is surprisingly hard to find on the Web).
Within a given call frame, the compiler could place variables in arbitrary stack slots. It may try (when optimizing) to reorganize the stack at will, e.g. by decreasing alignment constraints. You should not worry about that.
A compiler try to put local variables on stack location with suitable alignment. See the alignof extension of GCC. Where exactly the compiler put these variables is not important, see my answer here. (If it is important to your code, you really should pack the variables in a single common local struct, since each compiler, version and optimization flags could do different things; so don't depend on that precise behavior of your particular compiler).

C array setting of array element value beyond size of array

I Have this C code snippet
int numbers[4]={1};
numbers[0]=1; numbers[1]=2; numbers[3]=3; numbers[10]=4;
printf("numbers: %d %d %d %d %d %d\n",numbers[0],numbers[1],numbers[3],numbers[6],numbers[10], numbers[5]) ;
The Output for this snippet produces :
numbers: 1 2 3 963180397 4 0
Well I have couple of questions
wont setting numbers[10] give an error as array is just of size 4 if not then why ( as it didn't give any error )
why printing numbers[6] gives garbage value whereas numbers[5] gives value of 0 ? shouldn't it also be a garbage value.
what effect does setting numbers[10] has i know it does not increases size of array but what does it do then?
Thanks in advance . PS i used GCC to compile the code!!
This won't give an error, your array is declared on the stack so what number[10] does is write at the adress number + (10*sizeof int) and overwrites anything that would be there.
As Xymostech said 0 can be as much garbage as 963180397. Printing numbers[6] will print what is stored at the address numbers + (6*sizeof int) so it depends on how your program is compiled, if you have declared local variables before of after numbers, etc.
See answer 1.
What you can do is this :
int empty[100];
int numbers[4]={1};
int empty2[100];
memset(empty, 0xCC, sizeof empty);
memset(empty2, 0xDD, sizeof empty2);
numbers[0]=1;numbers[1]=2;numbers[3]=3;numbers[10]=4;
printf("numbers: %d %d %d %d %d %d\n",numbers[0],numbers[1],numbers[3],numbers[6],numbers[10], numbers[5]) ;
Now you can understand what you are overwriting when accessing out of your numbers array
To answer your questions:
Not necessarily. The compiler could reserve memory in larger chunks if you declared the array statically or you could have just overwritten whatever else comes after the array on the stack.
That depends on the compiler and falls under "undefined behaviour".
You set (numbers + 10) to the value after the equal sign.
It doesn't cause any errors because it is decayed to a pointer arithmetic.
When you write numbers[10], it is just numbers + 10 * sizeof(numbers), which is fairly correct.
It's undefined behavior to access memory you're not meant to (not allocated for you), so every index out of bound that you access is garbage, including 0.
Accessing indexes greater than 4 will not increase the array's size, as you said, and additionally, it does not do anything either.

Resources