Negative array index in C

Negative array index in C - c

There have been other question/answers on negative array in C in the forum, but i would request the answer to these for a 32-bit compiler :
If we have an array defined int test_array[5] = {1,2,3,4,5};
then what should following statements return
test_array[20], test_array[-2], test_array[-32764], test_array[4294967700](value greater than what 32 bit can accommodate), *(a-32764) etc
Does the compiler force any fixed value to be returned in case index go beyond its declared range ?

It is indefined behaviour as you write it since you are accessing the array out-of-bounds.
However, negative indices do not necessarily mean undefined behaviour. The following code is well defined:
int test_array[5] = {1,2,3,4,5};
int *p = test_array + 1;
int i = p[-1];//i now has the value 1
This is equivalent to:
int i = *(p-1);

Accessing an array beyond its bounds results in Undefined Behavior(UB).
An -ve index is not an valid index and results in Undefined Behavior.
An Undefined Bheavior means anything can happen literally, If you are lucky your program will crash and the problem gets detected, If you are unlucky the code works fine all along and all hell breaks loose some day.
So, always avoid writing any code which causes an Undefined Behaivor.
Does the compiler force any fixed value to be returned in case index go beyond its declared range ?
NO
The programmer has to take care of this. The Standard does not need the compiler to give you any indication/warning for this.The standard just defines it as an UB.
Furthermore the standard identifies all of the following scenarios to cause Undefined Behavior:
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that does not point into, or just beyond, the same array object.
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated.
An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]).

Accessing elements outside the array is Undefined Behaviour.
Furthermore, making a pointer point to an element outside of the array, except for the (inexistant) one-past-the-last, is also Undefined Behaviour. Accessing the one-past-the-last is Undefined Behaviour (the pointer existence is ok)
int arr[42] = {0};
int *ptr = arr;
ptr += 41; /* ok, ptr points to the last element of arr */
*ptr; /* ok */
ptr += 1; /* ok, ptr points to one-past-the-last */
*ptr; /* UB */
ptr += 1; /* UB */
ptr = arr;
ptr -= 1; /* UB */

Related

Trouble incrementing and decrementing a malloced array of multiple data types in C

In my Computer Science course, we have been taught a method of storing a value in the 0th element of a malloced array, then incrementing the array so that things such as the size of the array can be stored in that element and retrieved later. I have tried using a modified version of this method to store various datatypes in these incremented elements.
Here is an example of how such an array is created:
int *array;
array = malloc(sizeof(int) + sizeof(double) + (n * sizeof(int)))
*(array) = n;
array++;
(double*)array++;
return array;
In this example, the sizeof(int) and sizeof(double) in the malloc statement are the elements that will store things, such as the size of the array in the int element, and in the double element we can store something like the average of all the numbers in the array (excluding these two elements of course)
(n * sizeof(int)) is for creating the rest of the elements in the array, where n is the number of elements, and sizeof(int) is the desired data type for these elements, and in theory this should work for an array of any data type.
Now, here is the trouble I am having:
I have created another function to retrieve the size of the array, but I am having trouble decrementing and incrementing the array. Here is my code:
getArraySize(void* array){
(double*)array--;//Decrement past the double element
(int*)array--;//Decrement past the int element
int size = *((int*)array);//Acquire size of the array
(int*)array++;//Increment past int element
(double*)array++;//Increment past the double element
return size;}
This function fails to get the size of the array, and I have realized it is because the compiler first increments the array then type casts it. However, when i try to fix such increment/decrement statements as follows:
((int*)array)++;
I get an error that says lvalue required as increment operand. I do not know how to fix this notation in such a way that it will increment and decrement correctly. Any suggestions would be much appreciated.

In my Computer Science course, we have been taught a method of storing a value in the 0th element of a malloced array, then incrementing the array so that things such as the size of the array can be stored in that element and retrieved later.
Sorry to hear that, since this is utter nonsense. Use struct instead.
What's worse than the task being nonsense however, is that it also invokes undefined behavior (see the C standard 6.5.6). You cannot do pointer arithmetic with that are not pointing to an array with the same type as the pointer itself.
In addition, this may lead to misaligned access. Depending on CPU, misalignment could cause needlessly slow code or instruction traps leading to a program crash. Misaligned access is also undefined behavior.
Also, storing the result of various operations on a data type, such as average, inside the data type itself doesn't make any sense at all. They would have to be updated as soon as a value changes, which causes needless bloat and ineffective code.
Forget about all this nonsense immediately. Your program cannot get fixed or repaired, since the very idea behind it is fundamentally wrong. Do like this instead:
typedef struct
{
int i;
double d;
int array[];
} something;
something* s = malloc(sizeof(something) + sizeof(int[n]));
s->i = ...;
s->d = ...;
for(int i=0; i<n; i++)
s->array[i] = ...;
...
free(s);
Specifically, your code invokes undefined behavior per C17 6.5.6 §7 and §8:
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. /--/
If both the pointer operand and the result point to elements of the same array object... /--/ ...otherwise, the
behavior is undefined.
There is also the issue of pointer aliasing, but (by luck?) it doesn't apply in this case, since data allocated on the heap doesn't have an "effective type" until written to. Long as you write to a specific address with the same pointer type, it is not undefined behavior.
Relevant parts regarding misalignment is C17 6.3.2.3/7:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.

What you can do to reach your goal and is (to my opinion) more readable anyway:
array -= sizeof(double); // get to position where double starts
array -= sizeof(int); // get to position where int starts
NOTE
This only works on some compilers and within getArraySize since you casted the array pointer to void*. So this is also not advisable at all
But I really think that this is NOT the way to go and I also recommend to use a struct instead as #Lundin points out.
If you call your getArraySize function with any other pointer or with a pointer of the expected array but not at the right position, it will most likely end up in segmentation faults

C programming , pointers run time error

I coded something like the following code,and I was able to assign a value to the new address after increasing it but was not able to print this value run time error, Also after assigning a value to the location this pointer pointing to, pointer value changed to be 14. Anyone has an idea of what's going on ?
Why the pointer value itself changed to 14 after assigning value to the location itself ?
I did not get any error after increasing the pointer value too !
#include <stdio.h>
int main()
{
int x = 10;
int *ptr = &x;
printf("%x\n",ptr); // ptr value
ptr++; //No ERROR !!
printf("%x\n",ptr); //ptr value +4 bytes no error!!!
*ptr = 20;
printf("%x\n",ptr); //ptr=14
printf("%x\n",*ptr); // run time error happens here only
return 0;
}

This is undefined behavior. When you incremented the pointer variable then it was pointing to one past the variable x (4 Bytes past in your system). But then you dereference it. First of all the memory you made change to is not allocated by you. And also it is not a location that is already allocated (like part of an array etc). It is Undefined behavior to access it.
And again you can assign it to any possible address. But dereferencing it would be undefined behavior in case the memory address it points to is invalid.
From standard 6.3.2.3
The unary * operator denotes indirection. If the operand points to a
function, the result is a function designator; if it points to an
object, the result is an lvalue designating the object. If the operand
has type ''pointer to type'', the result has type type. If an
invalid value has been assigned to the pointer, the behavior of the
unary * operator is undefined

When you do ptr++, it points "one element" past x. This is allowed, because x in this case is treated as an array of size 1, and a pointer is allowed to point one element past the end of an array. You can also subsequently print the value of that pointer with no problem.
What you can't do however is dereference a pointer to one element past the end. That invokes undefined behavior. In this case that behavior manifested as the pointer having an unexpected value and a subsequent crash.
That being said, here's what probably happened.
ptr was most likely placed right after x in memory, so after doing ptr++, ptr was pointing to itself. So *ptr = 20; had the effect of of setting ptr to 20. The value 14 that was printed is in hex, which is the same as 20 decimal. This explains the value that was printed.
Then you tried to print *ptr, which in this case says "print the int value at address 0x14". That is most likely not a valid address, so attempting to read it caused a crash.
You can't however depend on this behavior. You could add an extra printf or compile with different optimization settings and the observed behavior would change.

Is it UB to access an element one past the end of a row of a 2d array?

Is the behavior of the following program undefined?
#include <stdio.h>
int main(void)
{
int arr[2][3] = { { 1, 2, 3 },
{ 4, 5, 6 }
};
int *ptr1 = &arr[0][0]; // pointer to first elem of { 1, 2, 3 }
int *ptr3 = ptr1 + 2; // pointer to last elem of { 1, 2, 3 }
int *ptr3_plus_1 = ptr3 + 1; // pointer to one past last elem of { 1, 2, 3 }
int *ptr4 = &arr[1][0]; // pointer to first elem of { 4, 5, 6 }
// int *ptr_3_plus_2 = ptr3 + 2; // this is not legal
/* It is legal to compare ptr3_plus_1 and ptr4 */
if (ptr3_plus_1 == ptr4) {
puts("ptr3_plus_1 == ptr4");
/* ptr3_plus_1 is a valid address, but is it legal to dereference it? */
printf("*ptr3_plus_1 = %d\n", *ptr3_plus_1);
} else {
puts("ptr3_plus_1 != ptr4");
}
return 0;
}
According to §6.5.6 ¶8:
Moreover, if the expression P points to the last element of an
array object, the expression (P)+1 points one past the last
element of the array object.... If both the pointer operand and the
result point to elements of the same array object, or one past the
last element of the array object, the evaluation shall not produce an
overflow; otherwise, the behavior is undefined. If the result points
one past the last element of the array object, it shall not be used as
the operand of a unary * operator that is evaluated.
From this, it would appear that the behavior of the above program is undefined; ptr3_plus_1 points to an address one past the end of the array object from which it is derived, and dereferencing this address causes undefined behavior.
Further, Annex J.2 suggests that this is undefined behavior:
An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
There is some discussion of this issue in the Stack Overflow question, One-dimensional access to a multidimensional array: well-defined C?. The consensus here appears to be that this kind of access to arbitrary elements of a two-dimensional array through one-dimensional subscripts is indeed undefined behavior.
The issue, as I see it, is that it is not even legal to form the address of the pointer ptr3_plus_2, so it is not legal to access arbitrary two-dimensional array elements in this way. But, it is legal to form the address of the pointer ptr3_plus_1 using this pointer arithmetic. Further, it is legal to compare the two pointers ptr3_plus_1 and ptr4, according to §6.5.9 ¶6:
Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer
to one past the end of one array object and the other is a pointer to
the start of a different array object that happens to immediately
follow the first array object in the address space.
So, if it both ptr3_plus_1 and ptr4 are valid pointers that compare equal and that must point to the same address (the object pointed to by ptr4 must be adjacent in memory to the object pointed to by ptr3 anyway, since array storage must be contiguous), it would seem that *ptr3_plus_1 is as valid as *ptr4.
Is this undefined behavior, as described in §6.5.6 ¶8 and Annex J.2, or is this an exceptional case?
To Clarify
It seems unambiguous that it is undefined behavior to attempt to access the element one past the end of the final row of a two-dimensional array. My interest is in the question of whether it is legal to access the first element of the intermediate rows by forming a new pointer using a pointer to an element from the previous row and pointer arithmetic. It seems to me that a different example in Annex J.2 could have made this more clear.
Is it possible to reconcile the clear statement in §6.5.6 ¶8 that an attempted dereference of a pointer to the location one past the end of an array leads to undefined behavior with the idea that the pointer past the end of the first row of a two-dimensional array of type T[][] is also a pointer of type T * that points to an object of type T, namely the first element of an array of type T[]?

So, if it both ptr3_plus_1 and ptr4 are valid pointers that compare equal and that must point to the same address
They are.
it would seem that *ptr3_plus_1 is as valid as *ptr4.
It is not.
The pointers are equal, but not equivalent. The trivial well-known example of the distinction between equality and equivalence is negative zero:
double a = 0.0, b = -0.0;
assert (a == b);
assert (1/a != 1/b);
Now, to be fair, there is a difference between the two, as positive and negative zero have a different representation, ptr3_plus_1 and ptr4 on typical implementations have the same representation. This is not guaranteed, and on implementations where they would have different representations, it should be clear that your code might fail.
Even on the typical implementations, while there are good arguments to be made that the same representation implies equivalent values, to the best of my knowledge, the official interpretation is that the standard does not guarantee this, therefore programs cannot rely on it, therefore implementations can assume programs do not do this and optimise accordingly.

A debugging implementation might use "fat" pointers. For example, a pointer may be represented as a tuple (address, base, size) to detect out-of-bounds access. There is absolutely nothing wrong or contrary to the standard about such representation. So any pointer arithmetic that brings the pointer outside the range of [base, base+size] fails, and any dereference outside of [base, base+size) also fails.
Note that base and size are not the address and the size of the 2D array but rather of the array that the pointer points into (the row in this case).
It might sound trivial in this case, but when deciding whether a certain pointer construction is UB or not, it is useful to mentally run your example through this hypothetical implementation.

2D Array indexing - undefined behavior?

I've recently got into some pieces of code doing some questionable 2D arrays indexing operations. Considering as an example the following code sample:
int a[5][5];
a[0][20] = 3;
a[-2][15] = 4;
a[5][-3] = 5;
Are the indexing operations above subject to undefined behavior?

It's undefined behavior, and here's why.
Multidimensional array access can be broken down into a series of single-dimensional array accesses. In other words, the expression a[i][j] can be thought of as (a[i])[j]. Quoting C11 §6.5.2.1/2:
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
This means the above is identical to *(*(a + i) + j). Following C11 §6.5.6/8 regarding addition of an integer and pointer (emphasis mine):
If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
In other words, if a[i] is not a valid index, the behavior is immediately undefined, even if "intuitively" a[i][j] seems in-bounds.
So, in the first case, a[0] is valid, but the following [20] is not, because the type of a[0] is int[5]. Therefore, index 20 is out of bounds.
In the second case, a[-1] is already out-of-bounds, thus already UB.
In the last case, however, the expression a[5] points to one past the last element of the array, which is valid as per §6.5.6/8:
... if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object ...
However, later in that same paragraph:
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
So, while a[5] is a valid pointer, dereferencing it will cause undefined behavior, which is caused by the final [-3] indexing (which, is also out-of-bounds, therefore UB).

array indexing with negative indexes is undefined behaviour. Sorry, that a[-3] is the same as *(&a - 3) in most architectures/compilers, and accepted without warning, but the C language allows you to add negative integers to pointers, but not use negative values as array indexes. Of curse this is not even checked at runtime.
Also, there are some issues to be acquainted for when defining arrays in front to pointers. You can leave unspecified just the first subindex, and no more, like in:
int a[][3][2]; /* array of unspecified size, definition is alias of int (*a)[3][2]; */
(indeed, the above is a pointer definition, not an array, just print sizeof a)
or
int a[4][3][2]; /* array of 24 integers, size is 24*sizeof(int) */
when you do this, the way to evaluate the offset is different for arrays than for pointers, so be carefull. In case of arrays, int a[I][J][K];
&a[i][j][k]
is placed at
&a + i*(sizeof(int)*J*K) + j*(sizeof(int)*K) + k*(sizeof(int))
but when you declare
int ***a;
then a[i][j][k] is the same as:
*(*(*(&a+i)+j)+k), meaning you have to dereference pointer a, then add (sizeof(int **))*i to its value, then dereference again, then add (sizeof (int *))*j to that value, then dereference it, and add (sizeof(int))*k to that value to get the exact address of the data.
BR

Is it legal to compare a pointer to the beginning of an array with a pointer of the same type pointing before the beginning of the array?

Is this program legal C? If so, please support your claim (either way) with references to one of the language standards.
void f(char *p) {
char *q = p - 1;
(void)( q < p );
};
int main(void) {
char arr[] = "Hello";
f( arr );
}
In particular, I'm interested in whether the q < p comparison is legal or not.

No, it isn't. Using a pointer which doesn't point to an element of the array or one past its end (i. e. which isn't in the range [&arr[0], &arr[size]]) invokes undefined behavior.
C11 Standard, 6.5.6.8 ("Additive Operators"):
If both the pointer operand and the result [of P + N] point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
(emphasis mine)

No, this is not legal. A pointer must either point into an array, or one past the end of it, or be null.
ISO C11, Appendix J.2 "Undefined behavior", says behavior is undefined when:
Addition or subtraction of a pointer into, or just beyond, an array object and an
integer type produces a result that does not point into, or just beyond, the same array
object (6.5.6).
This is the case in the line
char *q = p - 1;
when p == &arr[0], and a single line having UB causes the whole program to have UB. Note that you don't have to compare the pointer or dereference it or anything. The subtraction is enough.

I don't know about legal but it sure does not make sense.
p points to the array, which means holds the address of the array.
q points to one address block before the array.
Whenever you compare them you'll be comparing the address of two sequential address blocks. The result will always be true, since you are basically comparing p and p-1