Negative index in array [duplicate] - c

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Negative array indexes in C?
Can I use negative indices in arrays?
#include <stdio.h>
int main(void)
{
char a[] = "pascual";
char *p = a;
p += 3;
printf("%c\n", p[-1]); /* -1 is valid here? */
return 0;
}

Yes, -1 is valid in this context, because it points to a valid location in memory allocated to your char a[] array. p[-1] is equivalent to *(p-1). Following the chain of assignments in your example, it is the same as a+3-1, or a+2, which is valid.
EDIT : The general rule is that an addition / subtraction of an integer and a pointer (and by extension, the equivalent indexing operations on pointers) need to produce a result that points to the same array or one element beyond the end of the array in order to be valid. Thanks, Eric Postpischil for a great note.

C 2011 online draft
6.5.6 Additive operators
8 ...if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist....
Emphasis mine. So, in your specific example, p[-1] is valid, since it points to an existing element of a; however, a[-1] would not be valid, since a[-1] points to a non-existent element of a. Similarly, p[-4] would not be valid, a[10] would not be valid, etc.

Of course it is valid.
(C99, 6.5.2p1) "One of the expressions shall have type ‘‘pointer to object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.

Generally using such negative indexes is a Bad Idea(TM). However, I found one place where this can be useful: trig lookup tables. For such a look up table, we need to use some angle measure as the index. For example, I can index sin values for angles between -180 degrees and +180 using degrees as an index. Or if I want to use radions instead, I can use a multiple of some fraction of PI, say PI/3, for the index. Then I can get cos values between -PI and PI by multiples of PI/3.

Yes, this is legal, as C lets you do unsafe pointer arithmetic all day. However, this is confusing, so don't do it. See also this answer to the same question.

Related

GCC accepts index[var] (eg.2[a]), and treats it same as var[index] (eg. a[2]) [duplicate]

As Joel points out in Stack Overflow podcast #34, in C Programming Language (aka: K & R), there is mention of this property of arrays in C: a[5] == 5[a]
Joel says that it's because of pointer arithmetic but I still don't understand. Why does a[5] == 5[a]?
The C standard defines the [] operator as follows:
a[b] == *(a + b)
Therefore a[5] will evaluate to:
*(a + 5)
and 5[a] will evaluate to:
*(5 + a)
a is a pointer to the first element of the array. a[5] is the value that's 5 elements further from a, which is the same as *(a + 5), and from elementary school math we know those are equal (addition is commutative).
Because array access is defined in terms of pointers. a[i] is defined to mean *(a + i), which is commutative.
I think something is being missed by the other answers.
Yes, p[i] is by definition equivalent to *(p+i), which (because addition is commutative) is equivalent to *(i+p), which (again, by the definition of the [] operator) is equivalent to i[p].
(And in array[i], the array name is implicitly converted to a pointer to the array's first element.)
But the commutativity of addition is not all that obvious in this case.
When both operands are of the same type, or even of different numeric types that are promoted to a common type, commutativity makes perfect sense: x + y == y + x.
But in this case we're talking specifically about pointer arithmetic, where one operand is a pointer and the other is an integer. (Integer + integer is a different operation, and pointer + pointer is nonsense.)
The C standard's description of the + operator (N1570 6.5.6) says:
For addition, either both operands shall have arithmetic type, or one
operand shall be a pointer to a complete object type and the other
shall have integer type.
It could just as easily have said:
For addition, either both operands shall have arithmetic type, or the left
operand shall be a pointer to a complete object type and the right operand
shall have integer type.
in which case both i + p and i[p] would be illegal.
In C++ terms, we really have two sets of overloaded + operators, which can be loosely described as:
pointer operator+(pointer p, integer i);
and
pointer operator+(integer i, pointer p);
of which only the first is really necessary.
So why is it this way?
C++ inherited this definition from C, which got it from B (the commutativity of array indexing is explicitly mentioned in the 1972 Users' Reference to B), which got it from BCPL (manual dated 1967), which may well have gotten it from even earlier languages (CPL? Algol?).
So the idea that array indexing is defined in terms of addition, and that addition, even of a pointer and an integer, is commutative, goes back many decades, to C's ancestor languages.
Those languages were much less strongly typed than modern C is. In particular, the distinction between pointers and integers was often ignored. (Early C programmers sometimes used pointers as unsigned integers, before the unsigned keyword was added to the language.) So the idea of making addition non-commutative because the operands are of different types probably wouldn't have occurred to the designers of those languages. If a user wanted to add two "things", whether those "things" are integers, pointers, or something else, it wasn't up to the language to prevent it.
And over the years, any change to that rule would have broken existing code (though the 1989 ANSI C standard might have been a good opportunity).
Changing C and/or C++ to require putting the pointer on the left and the integer on the right might break some existing code, but there would be no loss of real expressive power.
So now we have arr[3] and 3[arr] meaning exactly the same thing, though the latter form should never appear outside the IOCCC.
And, of course
("ABCD"[2] == 2["ABCD"]) && (2["ABCD"] == 'C') && ("ABCD"[2] == 'C')
The main reason for this was that back in the 70's when C was designed, computers didn't have much memory (64KB was a lot), so the C compiler didn't do much syntax checking. Hence "X[Y]" was rather blindly translated into "*(X+Y)"
This also explains the "+=" and "++" syntaxes. Everything in the form "A = B + C" had the same compiled form. But, if B was the same object as A, then an assembly level optimization was available. But the compiler wasn't bright enough to recognize it, so the developer had to (A += C). Similarly, if C was 1, a different assembly level optimization was available, and again the developer had to make it explicit, because the compiler didn't recognize it. (More recently compilers do, so those syntaxes are largely unnecessary these days)
One thing no-one seems to have mentioned about Dinah's problem with sizeof:
You can only add an integer to a pointer, you can't add two pointers together. That way when adding a pointer to an integer, or an integer to a pointer, the compiler always knows which bit has a size that needs to be taken into account.
To answer the question literally. It is not always true that x == x
double zero = 0.0;
double a[] = { 0,0,0,0,0, zero/zero}; // NaN
cout << (a[5] == 5[a] ? "true" : "false") << endl;
prints
false
I just find out this ugly syntax could be "useful", or at least very fun to play with when you want to deal with an array of indexes which refer to positions into the same array. It can replace nested square brackets and make the code more readable !
int a[] = { 2 , 3 , 3 , 2 , 4 };
int s = sizeof a / sizeof *a; // s == 5
for(int i = 0 ; i < s ; ++i) {
cout << a[a[a[i]]] << endl;
// ... is equivalent to ...
cout << i[a][a][a] << endl; // but I prefer this one, it's easier to increase the level of indirection (without loop)
}
Of course, I'm quite sure that there is no use case for that in real code, but I found it interesting anyway :)
Nice question/answers.
Just want to point out that C pointers and arrays are not the same, although in this case the difference is not essential.
Consider the following declarations:
int a[10];
int* p = a;
In a.out, the symbol a is at an address that's the beginning of the array, and symbol p is at an address where a pointer is stored, and the value of the pointer at that memory location is the beginning of the array.
For pointers in C, we have
a[5] == *(a + 5)
and also
5[a] == *(5 + a)
Hence it is true that a[5] == 5[a].
Not an answer, but just some food for thought.
If class is having overloaded index/subscript operator, the expression 0[x] will not work:
class Sub
{
public:
int operator [](size_t nIndex)
{
return 0;
}
};
int main()
{
Sub s;
s[0];
0[s]; // ERROR
}
Since we dont have access to int class, this cannot be done:
class int
{
int operator[](const Sub&);
};
It has very good explanation in A TUTORIAL ON POINTERS AND ARRAYS IN C
by Ted Jensen.
Ted Jensen explained it as:
In fact, this is true, i.e wherever one writes a[i] it can be
replaced with *(a + i) without any problems. In fact, the compiler
will create the same code in either case. Thus we see that pointer
arithmetic is the same thing as array indexing. Either syntax produces
the same result.
This is NOT saying that pointers and arrays
are the same thing, they are not. We are only saying that to identify
a given element of an array we have the choice of two syntaxes, one
using array indexing and the other using pointer arithmetic, which
yield identical results.
Now, looking at this last
expression, part of it.. (a + i), is a simple addition using the +
operator and the rules of C state that such an expression is
commutative. That is (a + i) is identical to (i + a). Thus we could
write *(i + a) just as easily as *(a + i).
But *(i + a) could have come from i[a] ! From all of this comes the curious
truth that if:
char a[20];
writing
a[3] = 'x';
is the same as writing
3[a] = 'x';
I know the question is answered, but I couldn't resist sharing this explanation.
I remember Principles of Compiler design,
Let's assume a is an int array and size of int is 2 bytes,
& Base address for a is 1000.
How a[5] will work ->
Base Address of your Array a + (5*size of(data type for array a))
i.e. 1000 + (5*2) = 1010
So,
Similarly when the c code is broken down into 3-address code,
5[a] will become ->
Base Address of your Array a + (size of(data type for array a)*5)
i.e. 1000 + (2*5) = 1010
So basically both the statements are pointing to the same location in memory and hence, a[5] = 5[a].
This explanation is also the reason why negative indexes in arrays work in C.
i.e. if I access a[-5] it will give me
Base Address of your Array a + (-5 * size of(data type for array a))
i.e. 1000 + (-5*2) = 990
It will return me object at location 990.
in c compiler
a[i]
i[a]
*(a+i)
are different ways to refer to an element in an array ! (NOT AT ALL WEIRD)
In C arrays, arr[3] and 3[arr] are the same, and their equivalent pointer notations are *(arr + 3) to *(3 + arr). But on the contrary [arr]3 or [3]arr is not correct and will result into syntax error, as (arr + 3)* and (3 + arr)* are not valid expressions. The reason is dereference operator should be placed before the address yielded by the expression, not after the address.
A little bit of history now. Among other languages, BCPL had a fairly major influence on C's early development. If you declared an array in BCPL with something like:
let V = vec 10
that actually allocated 11 words of memory, not 10. Typically V was the first, and contained the address of the immediately following word. So unlike C, naming V went to that location and picked up the address of the zeroeth element of the array. Therefore array indirection in BCPL, expressed as
let J = V!5
really did have to do J = !(V + 5) (using BCPL syntax) since it was necessary to fetch V to get the base address of the array. Thus V!5 and 5!V were synonymous. As an anecdotal observation, WAFL (Warwick Functional Language) was written in BCPL, and to the best of my memory tended to use the latter syntax rather than the former for accessing the nodes used as data storage. Granted this is from somewhere between 35 and 40 years ago, so my memory is a little rusty. :)
The innovation of dispensing with the extra word of storage and having the compiler insert the base address of the array when it was named came later. According to the C history paper this happened at about the time structures were added to C.
Note that ! in BCPL was both a unary prefix operator and a binary infix operator, in both cases doing indirection. just that the binary form included an addition of the two operands before doing the indirection. Given the word oriented nature of BCPL (and B) this actually made a lot of sense. The restriction of "pointer and integer" was made necessary in C when it gained data types, and sizeof became a thing.
Because it's useful to avoid confusing nesting.
Would you rather read this:
array[array[head].next].prev
or this:
head[array].next[array].prev
Incidentally, C++ has a similar commutative property for function calls. Rather than writing g(f(x)) as you must in C, you may use member functions to write x.f().g(). Replace f and g with lookup tables and you can write g[f[x]] (functional style) or (x[f])[g] (oop style). The latter gets really nice with structs containing indices: x[xs].y[ys].z[zs]. Using the more common notation that's zs[ys[xs[x].y].z].
Well, this is a feature that is only possible because of the language support.
The compiler interprets a[i] as *(a+i) and the expression 5[a] evaluates to *(5+a). Since addition is commutative it turns out that both are equal. Hence the expression evaluates to true.
In C
int a[]={10,20,30,40,50};
int *p=a;
printf("%d\n",*p++);//output will be 10
printf("%d\n",*a++);//will give an error
Pointer p is a "variable", array name a is a "mnemonic" or "synonym",
so p++ is valid but a++ is invalid.
a[2] is equals to 2[a] because the internal operation on both of this is "Pointer Arithmetic" internally calculated as *(a+2) equals *(2+a)
Because C compiler always convert array notation in pointer notation.
a[5] = *(a + 5) also 5[a] = *(5 + a) = *(a + 5)
So, both are equal.
C was based on BCPL. BCPL directly exposed memory as a sequence of addressable words. The unary operator !X (also known as LV) gave you the contents of the address location X. For convenience there was also a binary operator X!Y equivalent to !(X+Y) which gave you the contents of the Y'th word of an array at location X, or equivalently, the X'th word of an array at location Y.
In C, X!Y became X[Y], but the original BCPL semantics of !(X+Y) show through, which accounts for why the operator is commutative.

Why is the printf statement in the code below printing a value rather than a garbage value?

int main(){
int array[] = [10,20,30,40,50] ;
printf("%d\n",-2[array -2]);
return 0 ;
}
Can anyone explain how -2[array-2] is working and Why are [ ] used here?
This was a question in my assignment it gives the output " -10 " but I don't understand why?
Technically speaking, this invokes undefined behaviour. Quoting C11, chapter §6.5.6
If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. [....]
So, (array-2) is undefined behavior.
However, most compilers will read the indexing, and it will likely be able to nullify the +2 and -2 indexing, [2[a] is same as a[2] which is same as *(a+2), thus, 2[a-2] is *((2)+(a-2))], and only consider the remaining expression to be evaluated, which is *(a) or, a[0].
Then, check the operator precedence
-2[array -2] is effectively the same as -(array[0]). So, the result is the value array[0], and -ved.
This is an unfortunate example for instruction, because it implies it's okay to do some incorrect things that often work in practice.
The technically correct answer is that the program has Undefined Behavior, so any result is possible, including printing -10, printing a different number, printing something different or nothing at all, failing to run, crashing, and/or doing something entirely unrelated.
The undefined behavior comes up from evaluating the subexpression array -2. array decays from its array type to a pointer to the first element. array -2 would point at the element which comes two positions before that, but there is no such element (and it's not the "one-past-the-end" special rule), so evaluating that is a problem no matter what context it appears in.
(C11 6.5.6/8 says)
When an expression that has integer type is added to or subtracted from a pointer, .... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Now the technically incorrect answer the instructor is probably looking for is what actually happens on most implementations:
Even though array -2 is outside the actual array, it evaluates to some address which is 2*sizeof(int) bytes before the address where the array's data starts. It's invalid to dereference that address since we don't know that there actually is any int there, but we're not going to.
Looking at the larger expression -2[array -2], the [] operator has higher precedence than the unary - operator, so it means -(2[array -2]) and not (-2)[array -2]. A[B] is defined to mean the same as *((A)+(B)). It's customary to have A be a pointer value and B be an integer value, but it's also legal to use them reversed like we're doing here. So these are equivalent:
-2[array -2]
-(2[array -2])
-(*(2 + (array - 2)))
-(*(array))
The last step acts like we would expect: Adding two to the address value of array - 2 is 2*sizeof(int) bytes after that value, which gets us back to the address of the first array element. So *(array) dereferences that address, giving 10, and -(*(array)) negates that value, giving -10. The program prints -10.
You should never count on things like this, even if you observe it "works" on your system and compiler. Since the language guarantees nothing about what will happen, the code might not work if you make slight changes which seem they shouldn't be related, or on a different system, a different compiler, a different version of the same compiler, or using the same system and compiler on a different day.
Here is how -2[array-2] is evaluated:
First, note that -2[array-2] is parsed as - (2[array-2]). The subscript operator, [...] has higher precedence than the unary - operator. We often think of constants like -2 as single numbers, but it is in fact a - operator applied to a 2.
In array-2, array is automatically converted to a pointer to its first element, so it points to array[0].
Then array-2 attempts to calculate a pointer to two elements before the first element of the array. The resulting behavior is not defined by the C standard because C 2018 6.5.6 8 says that only arithmetic that points to array members and the end of the array is defined.
For illustration only, suppose we are using a C implementation that extends the C standard by defining pointers to use a flat address space and permit arbitrary pointer arithmetic. Then array-2 points two elements before the array.
Then 2[array-2] uses the fact that the C standard defines E1[E2] to be *((E1)+(E2)). That is, the subscript operator is implemented by adding the two things and applying *. Thus, it does not matter which expression is E1 and which is E2. E1+E2 is the same as E2+E1. So 2[array-2] is *(2 + (array-2)). Adding 2 moves the pointer from two elements before the array back to the start of the array. Then applying * produces the element at that location, which is 10.
Finally, applying - gives −10. (Recall that this conclusion is only achieved using our supposition that the C implementation supports a flat address space. You cannot use this in general C code.)
This code invokes undefined behavior and can print anything, including -10.
C17 6.5.2.1 Array subscripting states:
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))
Meaning array[n] is equivalent to *((array) + (n)) and that's how the compiler evaluates subscripting. This allows us to write silly obfuscation like n[array] as 100% equivalent to array[n]. Because *((n) + (array)) is equivalent to *((array) + (n)). As explained here:
With arrays, why is it the case that a[5] == 5[a]?
Looking at the expression -2[array -2] specifically:
[array -2] and [array - 2] are naturally equivalent. In this case the former is just sloppy style purposely used for the sake of obfuscating the code.
Operator precedence tells us to first consider [].
Thus the expression is equivalent to -*( (2) + (array - 2) )
Note that the first - is not part of the integer constant 2. C does not support negative integer constants1), the - is actually the unary minus operator.
Unary minus has lower presedence than [], so the 2 in -2[ "binds" to the [.
The sub-expression (array - 2) is evaluated individually and invokes undefined behavior, as per C17 6.5.6/8:
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. /--/ If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Speculatively, one potential form of undefined behavior could be that a compiler decides to replace the whole expression (2) + (array - 2) with array, in which case the whole expression would end up as -*array and prints -10.
There's no guarantees of this and therefore the code is bad. If you were given the assignment to explain why the code prints -10, your teacher is incompetent. Not only is it meaningless/harmful to study obfuscation as part of C studies, it is harmful to rely on undefined behavior or expect it to give a certain result.
1) C rather supports negative integer constant expressions. -2 is an integer constant expression, where 2 is an integer constant of type int.

why this C program is working [duplicate]

As Joel points out in Stack Overflow podcast #34, in C Programming Language (aka: K & R), there is mention of this property of arrays in C: a[5] == 5[a]
Joel says that it's because of pointer arithmetic but I still don't understand. Why does a[5] == 5[a]?
The C standard defines the [] operator as follows:
a[b] == *(a + b)
Therefore a[5] will evaluate to:
*(a + 5)
and 5[a] will evaluate to:
*(5 + a)
a is a pointer to the first element of the array. a[5] is the value that's 5 elements further from a, which is the same as *(a + 5), and from elementary school math we know those are equal (addition is commutative).
Because array access is defined in terms of pointers. a[i] is defined to mean *(a + i), which is commutative.
I think something is being missed by the other answers.
Yes, p[i] is by definition equivalent to *(p+i), which (because addition is commutative) is equivalent to *(i+p), which (again, by the definition of the [] operator) is equivalent to i[p].
(And in array[i], the array name is implicitly converted to a pointer to the array's first element.)
But the commutativity of addition is not all that obvious in this case.
When both operands are of the same type, or even of different numeric types that are promoted to a common type, commutativity makes perfect sense: x + y == y + x.
But in this case we're talking specifically about pointer arithmetic, where one operand is a pointer and the other is an integer. (Integer + integer is a different operation, and pointer + pointer is nonsense.)
The C standard's description of the + operator (N1570 6.5.6) says:
For addition, either both operands shall have arithmetic type, or one
operand shall be a pointer to a complete object type and the other
shall have integer type.
It could just as easily have said:
For addition, either both operands shall have arithmetic type, or the left
operand shall be a pointer to a complete object type and the right operand
shall have integer type.
in which case both i + p and i[p] would be illegal.
In C++ terms, we really have two sets of overloaded + operators, which can be loosely described as:
pointer operator+(pointer p, integer i);
and
pointer operator+(integer i, pointer p);
of which only the first is really necessary.
So why is it this way?
C++ inherited this definition from C, which got it from B (the commutativity of array indexing is explicitly mentioned in the 1972 Users' Reference to B), which got it from BCPL (manual dated 1967), which may well have gotten it from even earlier languages (CPL? Algol?).
So the idea that array indexing is defined in terms of addition, and that addition, even of a pointer and an integer, is commutative, goes back many decades, to C's ancestor languages.
Those languages were much less strongly typed than modern C is. In particular, the distinction between pointers and integers was often ignored. (Early C programmers sometimes used pointers as unsigned integers, before the unsigned keyword was added to the language.) So the idea of making addition non-commutative because the operands are of different types probably wouldn't have occurred to the designers of those languages. If a user wanted to add two "things", whether those "things" are integers, pointers, or something else, it wasn't up to the language to prevent it.
And over the years, any change to that rule would have broken existing code (though the 1989 ANSI C standard might have been a good opportunity).
Changing C and/or C++ to require putting the pointer on the left and the integer on the right might break some existing code, but there would be no loss of real expressive power.
So now we have arr[3] and 3[arr] meaning exactly the same thing, though the latter form should never appear outside the IOCCC.
And, of course
("ABCD"[2] == 2["ABCD"]) && (2["ABCD"] == 'C') && ("ABCD"[2] == 'C')
The main reason for this was that back in the 70's when C was designed, computers didn't have much memory (64KB was a lot), so the C compiler didn't do much syntax checking. Hence "X[Y]" was rather blindly translated into "*(X+Y)"
This also explains the "+=" and "++" syntaxes. Everything in the form "A = B + C" had the same compiled form. But, if B was the same object as A, then an assembly level optimization was available. But the compiler wasn't bright enough to recognize it, so the developer had to (A += C). Similarly, if C was 1, a different assembly level optimization was available, and again the developer had to make it explicit, because the compiler didn't recognize it. (More recently compilers do, so those syntaxes are largely unnecessary these days)
One thing no-one seems to have mentioned about Dinah's problem with sizeof:
You can only add an integer to a pointer, you can't add two pointers together. That way when adding a pointer to an integer, or an integer to a pointer, the compiler always knows which bit has a size that needs to be taken into account.
To answer the question literally. It is not always true that x == x
double zero = 0.0;
double a[] = { 0,0,0,0,0, zero/zero}; // NaN
cout << (a[5] == 5[a] ? "true" : "false") << endl;
prints
false
I just find out this ugly syntax could be "useful", or at least very fun to play with when you want to deal with an array of indexes which refer to positions into the same array. It can replace nested square brackets and make the code more readable !
int a[] = { 2 , 3 , 3 , 2 , 4 };
int s = sizeof a / sizeof *a; // s == 5
for(int i = 0 ; i < s ; ++i) {
cout << a[a[a[i]]] << endl;
// ... is equivalent to ...
cout << i[a][a][a] << endl; // but I prefer this one, it's easier to increase the level of indirection (without loop)
}
Of course, I'm quite sure that there is no use case for that in real code, but I found it interesting anyway :)
Nice question/answers.
Just want to point out that C pointers and arrays are not the same, although in this case the difference is not essential.
Consider the following declarations:
int a[10];
int* p = a;
In a.out, the symbol a is at an address that's the beginning of the array, and symbol p is at an address where a pointer is stored, and the value of the pointer at that memory location is the beginning of the array.
For pointers in C, we have
a[5] == *(a + 5)
and also
5[a] == *(5 + a)
Hence it is true that a[5] == 5[a].
Not an answer, but just some food for thought.
If class is having overloaded index/subscript operator, the expression 0[x] will not work:
class Sub
{
public:
int operator [](size_t nIndex)
{
return 0;
}
};
int main()
{
Sub s;
s[0];
0[s]; // ERROR
}
Since we dont have access to int class, this cannot be done:
class int
{
int operator[](const Sub&);
};
It has very good explanation in A TUTORIAL ON POINTERS AND ARRAYS IN C
by Ted Jensen.
Ted Jensen explained it as:
In fact, this is true, i.e wherever one writes a[i] it can be
replaced with *(a + i) without any problems. In fact, the compiler
will create the same code in either case. Thus we see that pointer
arithmetic is the same thing as array indexing. Either syntax produces
the same result.
This is NOT saying that pointers and arrays
are the same thing, they are not. We are only saying that to identify
a given element of an array we have the choice of two syntaxes, one
using array indexing and the other using pointer arithmetic, which
yield identical results.
Now, looking at this last
expression, part of it.. (a + i), is a simple addition using the +
operator and the rules of C state that such an expression is
commutative. That is (a + i) is identical to (i + a). Thus we could
write *(i + a) just as easily as *(a + i).
But *(i + a) could have come from i[a] ! From all of this comes the curious
truth that if:
char a[20];
writing
a[3] = 'x';
is the same as writing
3[a] = 'x';
I know the question is answered, but I couldn't resist sharing this explanation.
I remember Principles of Compiler design,
Let's assume a is an int array and size of int is 2 bytes,
& Base address for a is 1000.
How a[5] will work ->
Base Address of your Array a + (5*size of(data type for array a))
i.e. 1000 + (5*2) = 1010
So,
Similarly when the c code is broken down into 3-address code,
5[a] will become ->
Base Address of your Array a + (size of(data type for array a)*5)
i.e. 1000 + (2*5) = 1010
So basically both the statements are pointing to the same location in memory and hence, a[5] = 5[a].
This explanation is also the reason why negative indexes in arrays work in C.
i.e. if I access a[-5] it will give me
Base Address of your Array a + (-5 * size of(data type for array a))
i.e. 1000 + (-5*2) = 990
It will return me object at location 990.
in c compiler
a[i]
i[a]
*(a+i)
are different ways to refer to an element in an array ! (NOT AT ALL WEIRD)
In C arrays, arr[3] and 3[arr] are the same, and their equivalent pointer notations are *(arr + 3) to *(3 + arr). But on the contrary [arr]3 or [3]arr is not correct and will result into syntax error, as (arr + 3)* and (3 + arr)* are not valid expressions. The reason is dereference operator should be placed before the address yielded by the expression, not after the address.
A little bit of history now. Among other languages, BCPL had a fairly major influence on C's early development. If you declared an array in BCPL with something like:
let V = vec 10
that actually allocated 11 words of memory, not 10. Typically V was the first, and contained the address of the immediately following word. So unlike C, naming V went to that location and picked up the address of the zeroeth element of the array. Therefore array indirection in BCPL, expressed as
let J = V!5
really did have to do J = !(V + 5) (using BCPL syntax) since it was necessary to fetch V to get the base address of the array. Thus V!5 and 5!V were synonymous. As an anecdotal observation, WAFL (Warwick Functional Language) was written in BCPL, and to the best of my memory tended to use the latter syntax rather than the former for accessing the nodes used as data storage. Granted this is from somewhere between 35 and 40 years ago, so my memory is a little rusty. :)
The innovation of dispensing with the extra word of storage and having the compiler insert the base address of the array when it was named came later. According to the C history paper this happened at about the time structures were added to C.
Note that ! in BCPL was both a unary prefix operator and a binary infix operator, in both cases doing indirection. just that the binary form included an addition of the two operands before doing the indirection. Given the word oriented nature of BCPL (and B) this actually made a lot of sense. The restriction of "pointer and integer" was made necessary in C when it gained data types, and sizeof became a thing.
Because it's useful to avoid confusing nesting.
Would you rather read this:
array[array[head].next].prev
or this:
head[array].next[array].prev
Incidentally, C++ has a similar commutative property for function calls. Rather than writing g(f(x)) as you must in C, you may use member functions to write x.f().g(). Replace f and g with lookup tables and you can write g[f[x]] (functional style) or (x[f])[g] (oop style). The latter gets really nice with structs containing indices: x[xs].y[ys].z[zs]. Using the more common notation that's zs[ys[xs[x].y].z].
Well, this is a feature that is only possible because of the language support.
The compiler interprets a[i] as *(a+i) and the expression 5[a] evaluates to *(5+a). Since addition is commutative it turns out that both are equal. Hence the expression evaluates to true.
In C
int a[]={10,20,30,40,50};
int *p=a;
printf("%d\n",*p++);//output will be 10
printf("%d\n",*a++);//will give an error
Pointer p is a "variable", array name a is a "mnemonic" or "synonym",
so p++ is valid but a++ is invalid.
a[2] is equals to 2[a] because the internal operation on both of this is "Pointer Arithmetic" internally calculated as *(a+2) equals *(2+a)
Because C compiler always convert array notation in pointer notation.
a[5] = *(a + 5) also 5[a] = *(5 + a) = *(a + 5)
So, both are equal.
C was based on BCPL. BCPL directly exposed memory as a sequence of addressable words. The unary operator !X (also known as LV) gave you the contents of the address location X. For convenience there was also a binary operator X!Y equivalent to !(X+Y) which gave you the contents of the Y'th word of an array at location X, or equivalently, the X'th word of an array at location Y.
In C, X!Y became X[Y], but the original BCPL semantics of !(X+Y) show through, which accounts for why the operator is commutative.

Why (1)["abcd"]+"efg"-'b'+1 becomes "fg"?

#include <stdio.h>
int main()
{
printf("%s", (1)["abcd"]+"efg"-'b'+1);
}
Can someone please explain why the output of this code is:
fg
I know (1)["abcd"] points to "bcd" but why +"efg"-'b'+1 is even a valid syntax ?
I know (1)["abcd"] points to "bcd"
No. (1)["abcd"] is a single char (b).
So (1)["abcd"]+"efg"-'b'+1 is: 'b' + "egf" - 'b' + 1 and if you simplify it, it becomes "efg" + 1. Hence it prints fg.
Note: The above answer explains only the observed behaviour which is not strictly legal as per the C language specification. Here's why.
case 1: 'b' < 0 or 'b' > 4
In this case, the expression (1)["abcd"] + "efg" - 'b' + 1 will lead to undefined behaviour, due to the sub-expression (1)["abcd"] + "efg", which is 'b' + "efg" producing an invalid pointer expression (C11, 6.5.5 Multiplicative operators -- quote below).
On the widely used ASCII character set, 'b' is 98 in decimal; on the not-so-widely used EBCDIC character set, 'b' is 130 in decimal. So the sub-expression (1)["abcd"] + "efg" would cause undefined behaviour on a system using either of these two.
So barring a weird architecture, where 'b' <= 4 and 'b' >= 0, this program would cause undefined behaviour due to how the
C language is defined:
C11, 5.1.2.3 Program execution
The semantic descriptions in this International Standard describe the
behavior of an abstract machine in which issues of optimization are
irrelevant. [...] In the abstract machine, all expressions are
evaluated as specified by the semantics. An actual implementation need
not evaluate part of an expression if it can deduce that its value is
not used and that no needed side effects are produced.
which categorically states that whole standard has been defined based on the abstract machine's behaviour.
So in this case, it does cause undefined behaviour.
case 2: 'b' >= 0 or 'b' <= 4 (This is quite imaginary, but in theory, it's possible).
In this case, the subexpression (1)["abcd"] + "efg" can be valid (and in turn, the whole expression (1)["abcd"] + "efg" - 'b' + 1).
The string literal "efg" consists of 4 chars, which is an array type (of type char[N] in C) and and the C standard guarantees (as quoted above) that the pointer expression evaluating to one-past the end of an array doesn't overflow or cause undefined behaviour.
The following are the possible sub-expressions and they are valid:
(1) "efg"+0 (2) "efg"+1 (3) "efg"+2 (4) "efg"+3 and (5) "efg"+4 because C standard states that:
C11, 6.5.5 Multiplicative operators
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
So it's not causing undefined behaviour in this case.
Thanks #zch & #Keith Thompson for digging out the relevant parts of C standard :)
There seems to be some confusion about the difference between the other two answers. Here's what happens, step by step:
(1)["abcd"]+"efg"-'b'+1
The first part, (1)["abcd"] takes advantage of the way arrays are processed in C. Let's look at the following:
int a[5] = { 0, 10, 20, 30, 40 };
printf("%d %d\n", a[2], 2[a]);
The output will be 20 20. Why? because the name of an array of int evaluates to its address, and its data type is pointer to int. Referring to an element of the integer array tells C add an offset to the address of the array and evaluate the result as type int. But this means C doesn't care about the order: a[2] is exactly the same as 2[a].
Similarly, since a is the address of the array, a + 1 is the address of the element at the first offset into the array. Of course, that's equivalent to 1 + a.
A string in C is just another, human-friendly, way of representing an array of type char. So (1)["abcd"] is the same as returning the element at the first offset into an array of the characters a, b, c, d, \0 ... which is the character b.
In C, every character has an integral value (generally its ASCII code). The value of b happens to be 98. The remainder of the evaluation, therefore, involves calculations with integers and an array: the character string "efg".
We have the address of the string. We add and subtract 98 (the ASCII value of the character b), and we add 1. The b's cancel each other, so the net result is one more than the address of the first character in the string, which is the address of the character f.
The %s conversion in the printf() tells C to treat the address as the first character in a string, and to print the entire string until it encounters the null character at the end.
So it prints fg, which is the part of the string "efg" that starts at the f.

why sizeof unsigned char array[10] is 10

The size of char is 1 byte, and wikipedia says:
sizeof is used to calculate the size of any datatype, measured in the
number of bytes required to represent the type.
However, i can store 11 bytes in unsigned char array[10] 0..10 but when i do sizeof(array) i get 10 bytes. can someone explain explain this behavior?
note: i have tried this on int datatype, the sizeof(array) was 40, where i expect it to be 44.
However, i can store 11 bytes in unsigned char array[10]
No, you cannot: 10 is not a valid index of array[10]. Arrays are indexed from zero to size minus one.
According to C99 Standard
6.5.3.4.3 When [sizeof operator is] applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.
That is why the result is going to be ten on all standard-compliant platform.
No, the valid indices will be 0-9 not 0-10, it will store 10 elements not 11, so the result of sizeof is correct. Accessing beyond index 9 will be out of bounds and undefined behavior, the relevant section of the C99 draft standard is 6.5.6/8, which covers pointer arithmetic:
[...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
Unlike the C++ standard which explicitly states an array has N elements numbered 0 to N-1 it looks like you need to dig into the examples for a similar statement in the C standard. In the C99 draft standard section 6.5.2.1/4, the example is:
int x[3][5];
and it goes on to state:
Here x is a 3 x 5 array of ints; more precisely, x is an array of three element objects, each of which is an array of five ints.
unsigned char array[10];/*Array of 10 elements*/
which means
array[0],array[1],array[2],array[3].......array[9]
so sizeof(array)=10 is correct.

Resources