Iterating over a 2D array with a single char pointer - c

While doing some research on multi-dimensional arrays in C and how they're stored in memory I came across this: "Does C99 guarantee that arrays are contiguous?". The top-voted answer states that "It must also be possible to iterate over the whole array with a (char *)," then provides the following "valid" code:
int a[5][5], i, *pi;
char *pc;
pc = (char *)(&a[0][0]);
for (i = 0; i < 25; i++)
{
pi = (int *)pc;
DoSomething(pi);
pc += sizeof(int);
}
The poster then goes on to say that "Doing the same with an (int *) would be undefined behavior, because, as said, there is no array[25] of int involved."
That line confuses me.
Why does using a char pointer constitute as valid / defined behavior while substituting it with an integer pointer doesn't?
Sorry if the answer to my question should be obvious. :(

The difference between using a char* and an int* is strict aliasing rules: If you access (&a[0][0])[6] (i. e. via an int*), the compiler is free to assumes that the access [6] does not leave the array at a[0]. As such, it is free to assumes that (&a[0][0]) + 6 and a[1] + 1 point to different memory locations, even though they don't, and reorder their accesses accordingly.
The char* is a difference because it is explicitly exempted from strict aliasing rules: You can cast anything to a char* and manipulate its bits through this pointer without invoking undefined behavior.

The standard is very clear that if you have:
int a[5];
int* p = &a[0];
Then
p += 6;
is cause for undefined behavior.
We also know that memory allocated for a 2D array such as
int a[5][5];
must be contiguous. Given that, if we use:
int* p1 = &a[0][0];
int* p2 = &a[1][0];
p1+5 is a legal expression and given the layout of a, it is equal to p2. Hence, if we use:
int* p3 = p1 + 6;
why should that not be equivalent to
int* p3 = p2 + 1;
If p2 + 1 is legal expression, why should p1 + 6 not be a legal expression?
From a purely pedantic interpretation of the standard, using p1 + 6 is cause for undefined behavior. However, it is possible that the standard does not adequately address the issue when it comes to 2D arrays.
In conclusion
From all practical points of view, there is no problem in using p1 + 6.
From a purely pedantic point of view, using p1 + 6 is undefined behavior.

Either an int pointer or a char pointer should work, but the operation should differ slightly in these two cases. Assuming sizeof(int) is 4. pc += sizeof(int) moves the pointer 4 bytes forward, but pi += sizeof(int) would move 4 times 4 bytes forward. If you want to use an int pointer, you should use pi ++.
EDIT: sorry about the answer above, using an int pointer does not comply with C99 (although it usually practically works). The reason is explained well in the original question: pointer goes across an array is not well defined in the standard. If you use an int pointer, you would start from a[0], which is a different array from a[1]. In this case, an a[0] int pointer cannot legally (well-defined) point to a[1] element.
SECOND EDIT: Using a char pointer is valid, because the following reason given by the original answer:
the array as a whole must be working when given to memset, memmove or memcpy with the sizeof. It must also be possible to iterate over the whole array with a (char *).
From section 6.5.6 "Additive Operators"
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
So it is reasonable.

Related

Adding the integer to hexadecimal address and How is the pointers calculation done in C? [duplicate]

#include<stdio.h>
int main(void){
int *ptr,a,b;
a = ptr;
b = ptr + 1;
printf("the vale of a,b is %x and %x respectively",a,b);
int c,d;
c = 0xff;
d = c + 1;
printf("the value of c d are %x and %x respectively",c,d);
return 0;
}
the out put value is
the vale of a,b is 57550c90 and 57550c94 respectively
the value of c d are ff and 100 respectively%
it turns out the ptr + 1 actually, why it behave this way?
Because pointers are designed to be compatible with arrays:
*(pointer + offset)
is equivalent to
pointer[offset]
So pointer aritmetic doesn't work in terms of bytes, but in terms of sizeof(pointer base type)-bytes sized blocks.
Consider what a pointer is... it's a memory address. Every byte in memory has an address. So, if you have an int that's 4 bytes and its address is 1000, 1001 is actually the 2nd byte of that int and 1002 is the third byte and 1003 is the fourth. Since the size of an int might vary from compiler to compiler, it is imperative that when you increment your pointer you don't get the address of some middle point in the int. So, the job of figuring out how many bytes to skip, based on your data type, is handled for you and you can just use whatever value you get and not worry about it.
As Basile Starynkvitch points out, this amount will vary depending on the sizeof property of the data member pointed to. It's very easy to forget that even though addresses are sequential, the pointers of your objects need to take into account the actual memory space required to house those objects.
Pointer arithmetic is a tricky subject. A pointer addition means passing to some next pointed element. So the address is incremented by the sizeof the pointed element.
Short answer
The address of the pointer will be incremented by sizeof(T) where T is the type pointed to. So for an int, the pointer will be incremented by sizeof(int).
Why?
Well first and foremost, the standard requires it. The reason this behaviour is useful (other than for compatibility with C) is because when you have a data structure which uses contiguous memory, like an array or an std::vector, you can move to the next item in the array by simply adding one to the pointer. If you want to move to the nth item in the container, you just add n.
Being able to write firstAddress + 2 is far simpler than firstAddress + (sizeof(T) * 2), and helps prevent bugs arising from developers assuming sizeof(int) is 4 (it might not be) and writing code like firstAddress + (4 * 2).
In fact, when you say myArray[4], you're saying myArray + 4. This is the reason that arrays indices start at 0; you just add 0 to get the first element (i.e. myArray points to the first element of the array) and n to get the nth.
What if I want to move one byte at a time?
sizeof(char) is guaranteed to be one byte in size, so you can use a char* if you really want to move one byte at a time.
A pointer is used to point to a specific byte of memory marking where an object has been allocated (technically it can point anywhere, but that's how it's used). When you do pointer arithmetic, it operates based on the size of the objects pointed to. In your case, it's a pointer to integers, which have a size of 4 bytes each.
Let consider a pointer p. The expression p+n is like (unsigned char *)p + n * sizeof *p (because sizeof(unsigned char) == 1).
Try this :
#include <stdio.h>
#define N 3
int
main(void)
{
int i;
int *p = &i;
printf("%p\n", (void *)p);
printf("%p\n", (void *)(p + N));
printf("%p\n", (void *)((unsigned char *)p + N * sizeof *p));
return 0;
}

Access an array from the end in C?

I recently noticed that in C, there is an important difference between array and &array for the following declaration:
char array[] = {4, 8, 15, 16, 23, 42};
The former is a pointer to a char while the latter is a pointer to an array of 6 chars. Also it is notable that the writing a[b] is a syntactic sugar for *(a + b). Indeed, you could write 2[array] and it works perfectly according to the standard.
So we could take advantage of this information to write this:
char last_element = (&array)[1][-1];
&array has a size of 6 chars so (&array)[1]) is a pointer to chars located right after the array. By looking at [-1] I am therefore accessing the last element.
With this I could for example swap the entire array :
void swap(char *a, char *b) { *a ^= *b; *b ^= *a; *a ^= *b; }
int main() {
char u[] = {1,2,3,4,5,6,7,8,9,10};
for (int i = 0; i < sizeof(u) / 2; i++)
swap(&u[i], &(&u)[1][-i - 1]);
}
Does this method for accessing an array by the end have flaws?
The C standard does not define the behavior of (&array)[1].
Consider &array + 1. This is defined by the C standard, for two reasons:
When doing pointer arithmetic, the result is defined for results from the first element (with index 0) of an array to one beyond the last element.
When doing pointer arithmetic, a pointer to a single object behaves like a pointer to an array with one element. In this case, &array is a pointer to a single object (that is itself an array, but the pointer arithmetic is for the pointer-to-the-array, not a pointer-to-an-element).
So &array + 1 is defined pointer arithmetic that points just beyond the end of array.
However, by definition of the subscript operator, (&array)[1] is *(&array + 1). While the &array + 1 is defined, applying * to it is not. C 2018 6.5.6 8 explicitly tells us, about result of pointer arithmetic, “If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.”
Because of the way most compilers are designed, the code in the question may move data around as you desire. However, this is not a behavior you should rely on. You can obtain a good pointer to just beyond the last element of the array with char *End = array + sizeof array / sizeof *array;. Then you can use End[-1] to refer to the last element, End[-2] to refer to the penultimate element, and so on.
Although the Standard specifies that arrayLvalue[i] means (*((arrayLvalue)+(i))), which would be processed by taking the address of the first element of arrayLvalue, gcc sometimes treats [], when applied to an array-type value or lvalue, as an operator which behaves line an indexed version of .member syntax, yielding a value or lvalue which the compiler will treat as being part of the array type. I don't know if this is ever observable when the array-type operand isn't a member of a struct or union, but the effects are clearly demonstrable in cases where it is, and I know of nothing that would guarantee that similar logic wouldn't be applied to nested arrays.
struct foo {unsigned char x[12]};
int test1(struct foo *p1, struct foo *p2)
{
p1->x[0] = 1;
p2->x[1] = 2;
return p1->x[0];
}
int test2(struct foo *p1, struct foo *p2)
{
char *p;
p1->x[0] = 1;
(&p2->x[0])[1] = 2;
return p1->x[0];
}
The code gcc generates for test1 will always return 1, while the generated code for test2 will return whatever is in p1->x[0]. I am unaware of anything in the Standard or the documentation for gcc that would suggest the two functions should behave differently, nor how one should force a compiler to generate code that would accommodate the case where p1 and p2 happen to identify overlapping parts of an allocated block in the event that should be necessary. Although the optimization used in test1() would be reasonable for the function as written, I know of no documented interpretation of the Standard that would treat that case as UB but define the behavior of the code if it wrote to p2->x[0] instead of p2->x[1].
I would do a for loop where I set i = length of the vector - 1 and each time instead of increasing it, I decrease it until it is greater than 0.
for(int i = vet.length;i>0;i--)

In C, why does incrementing a pointer adds the size of the type the pointer is referring to instead of 1? [duplicate]

#include<stdio.h>
int main(void){
int *ptr,a,b;
a = ptr;
b = ptr + 1;
printf("the vale of a,b is %x and %x respectively",a,b);
int c,d;
c = 0xff;
d = c + 1;
printf("the value of c d are %x and %x respectively",c,d);
return 0;
}
the out put value is
the vale of a,b is 57550c90 and 57550c94 respectively
the value of c d are ff and 100 respectively%
it turns out the ptr + 1 actually, why it behave this way?
Because pointers are designed to be compatible with arrays:
*(pointer + offset)
is equivalent to
pointer[offset]
So pointer aritmetic doesn't work in terms of bytes, but in terms of sizeof(pointer base type)-bytes sized blocks.
Consider what a pointer is... it's a memory address. Every byte in memory has an address. So, if you have an int that's 4 bytes and its address is 1000, 1001 is actually the 2nd byte of that int and 1002 is the third byte and 1003 is the fourth. Since the size of an int might vary from compiler to compiler, it is imperative that when you increment your pointer you don't get the address of some middle point in the int. So, the job of figuring out how many bytes to skip, based on your data type, is handled for you and you can just use whatever value you get and not worry about it.
As Basile Starynkvitch points out, this amount will vary depending on the sizeof property of the data member pointed to. It's very easy to forget that even though addresses are sequential, the pointers of your objects need to take into account the actual memory space required to house those objects.
Pointer arithmetic is a tricky subject. A pointer addition means passing to some next pointed element. So the address is incremented by the sizeof the pointed element.
Short answer
The address of the pointer will be incremented by sizeof(T) where T is the type pointed to. So for an int, the pointer will be incremented by sizeof(int).
Why?
Well first and foremost, the standard requires it. The reason this behaviour is useful (other than for compatibility with C) is because when you have a data structure which uses contiguous memory, like an array or an std::vector, you can move to the next item in the array by simply adding one to the pointer. If you want to move to the nth item in the container, you just add n.
Being able to write firstAddress + 2 is far simpler than firstAddress + (sizeof(T) * 2), and helps prevent bugs arising from developers assuming sizeof(int) is 4 (it might not be) and writing code like firstAddress + (4 * 2).
In fact, when you say myArray[4], you're saying myArray + 4. This is the reason that arrays indices start at 0; you just add 0 to get the first element (i.e. myArray points to the first element of the array) and n to get the nth.
What if I want to move one byte at a time?
sizeof(char) is guaranteed to be one byte in size, so you can use a char* if you really want to move one byte at a time.
A pointer is used to point to a specific byte of memory marking where an object has been allocated (technically it can point anywhere, but that's how it's used). When you do pointer arithmetic, it operates based on the size of the objects pointed to. In your case, it's a pointer to integers, which have a size of 4 bytes each.
Let consider a pointer p. The expression p+n is like (unsigned char *)p + n * sizeof *p (because sizeof(unsigned char) == 1).
Try this :
#include <stdio.h>
#define N 3
int
main(void)
{
int i;
int *p = &i;
printf("%p\n", (void *)p);
printf("%p\n", (void *)(p + N));
printf("%p\n", (void *)((unsigned char *)p + N * sizeof *p));
return 0;
}

Does "a[0]" mean something more than just "a"? [duplicate]

This question already has answers here:
What happens if I define a 0-size array in C/C++?
(8 answers)
Closed 8 years ago.
Me and a friend of mine are arguing about that so we thought
here we can get an appropriate answer, with the corresponding explanation.
int a[0];
is a[0] an array and does it have any advantages?
(I didn't only want to know what happened if I defined a and a[0] but also want to know the advantages and how it was more than just the variable.)
As a standalone stack variable, it is not useful. However, as the last member of a struct, it can be used as a variable length array and doing this used to be a common technique. For example:
struct foo {
int a[0];
};
struct foo* bar = malloc( sizeof *bar + 10 * sizeof *bar->a );
/* now you can (possibly) use bar->a as an array of 10 ints */
Note that this is a non-standard, non-portable hack and is certainly not good practice in new code.
YES, they are different according to their functionality.
Lets first get into their memory address.
int a[0];
printf("%p / %p", &a[0], &a);
As you can test a[0] uses the same address as a.
That means they are the same, but that doesn't mean their behaviour is the same.
That because &a aways points to a[0] which is the only one memory area allocated here.
However.. despite the logics, a[0] IS considered array.
Many people say that an array needs to have more then 1 memory block used, but thats wrong.
And that is because of the index operator [] which is what actually makes something an array. (systematic arrangement of objects)
Even if the object is one as it appears in here a[0] it is accepted as an array by the compiller and the debug. And you will have to
initialize the array in case you want to assign value INTO it.
Valid:
int a[] = {1};
Valid:
int a[0] = {1};
Invalid:
int a[0] = 1;
On top of that, converting a pointer to the array is valid as follows:
int a[0];
int* p;
p = a;
However you can't just set p to point into a one block of memory, it is reserved for the datatype. At the end we have an integer that points to itself.
The conclusion is that the compiler/debug and you by yourself can threat it like an array, but you can't use it.
(Or at least to expect propper result)
All that means that the difference between a an a[0] is the ability of a to point to its address.

Does C99 guarantee that arrays are contiguous?

Following an hot comment thread in another question, I came to debate of what is and what is not defined in C99 standard about C arrays.
Basically when I define a 2D array like int a[5][5], does the standard C99 garantee or not that it will be a contiguous block of ints, can I cast it to (int *)a and be sure I will have a valid 1D array of 25 ints.
As I understand the standard the above property is implicit in the sizeof definition and in pointer arithmetic, but others seems to disagree and says casting to (int*) the above structure give an undefined behavior (even if they agree that all existing implementations actually allocate contiguous values).
More specifically, if we think an implementation that would instrument arrays to check array boundaries for all dimensions and return some kind of error when accessing 1D array, or does not give correct access to elements above 1st row. Could such implementation be standard compilant ? And in this case what parts of the C99 standard are relevant.
We should begin with inspecting what int a[5][5] really is. The types involved are:
int
array[5] of ints
array[5] of arrays
There is no array[25] of ints involved.
It is correct that the sizeof semantics imply that the array as a whole is contiguous. The array[5] of ints must have 5*sizeof(int), and recursively applied, a[5][5] must have 5*5*sizeof(int). There is no room for additional padding.
Additionally, the array as a whole must be working when given to memset, memmove or memcpy with the sizeof. It must also be possible to iterate over the whole array with a (char *). So a valid iteration is:
int a[5][5], i, *pi;
char *pc;
pc = (char *)(&a[0][0]);
for (i = 0; i < 25; i++)
{
pi = (int *)pc;
DoSomething(pi);
pc += sizeof(int);
}
Doing the same with an (int *) would be undefined behaviour, because, as said, there is no array[25] of int involved. Using a union as in Christoph's answer should be valid, too. But there is another point complicating this further, the equality operator:
6.5.9.6
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. 91)
91) Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.
This means for this:
int a[5][5], *i1, *i2;
i1 = &a[0][0] + 5;
i2 = &a[1][0];
i1 compares as equal to i2. But when iterating over the array with an (int *), it is still undefined behaviour, because it is originally derived from the first subarray. It doesn't magically convert to a pointer into the second subarray.
Even when doing this
char *c = (char *)(&a[0][0]) + 5*sizeof(int);
int *i3 = (int *)c;
won't help. It compares equal to i1 and i2, but it isn't derived from any of the subarrays; it is a pointer to a single int or an array[1] of int at best.
I don't consider this a bug in the standard. It is the other way around: Allowing this would introduce a special case that violates either the type system for arrays or the rules for pointer arithmetic or both. It may be considered a missing definition, but not a bug.
So even if the memory layout for a[5][5] is identical to the layout of a[25], and the very same loop using a (char *) can be used to iterate over both, an implementation is allowed to blow up if one is used as the other. I don't know why it should or know any implementation that would, and maybe there is a single fact in the Standard not mentioned till now that makes it well defined behaviour. Until then, I would consider it to be undefined and stay on the safe side.
I've added some more comments to our original discussion.
sizeof semantics imply that int a[5][5] is contiguous, but visiting all 25 integers via incrementing a pointer like int *p = *a is undefined behaviour: pointer arithmetics is only defined as long as all pointers invoved lie within (or one element past the last element of) the same array, as eg &a[2][1] and &a[3][1] do not (see C99 section 6.5.6).
In principle, you can work around this by casting &a - which has type int (*)[5][5] - to int (*)[25]. This is legal according to 6.3.2.3 §7, as it doesn't violate any alignment requirements. The problem is that accessing the integers through this new pointer is illegal as it violates the aliasing rules in 6.5 §7. You can work around this by using a union for type punning (see footnote 82 in TC3):
int *p = ((union { int multi[5][5]; int flat[25]; } *)&a)->flat;
This is, as far as I can tell, standards compliant C99.
If the array is static, like your int a[5][5] array, it's guaranteed to be contiguous.

Resources