Get Product without using * operator hack-y way - c

Can someone please help me understand how does following logic resolves in obtaining product of a and b?
int getProd(int a, int b){
return (uintptr_t)&((char (*) [a])0x0)[b];
}

Suppose we have a pointer p, which points to objects of size a.
If we then say p + b, we're asking for a pointer to the b'th object past where p points.
So the actual new pointer value (on a byte-addressed machine, anyway), is going to be scaled by a, that is, the size of the pointed-to objects. That is, "under the hood", the compiler is going to do something more like p + b * a.
So we can see the multiplication a * b is happening -- but then it's getting added to the original value of p.
So if we use an initial value of 0, we'll get just a * b. And that's what the hacky getProd function is doing.
Let's break it down:
0x0
The value 0, also known in pointer contexts as a null pointer. [Footnote: there's more complexity to this definition, but let's not worry about that for the moment.]
char (*) [a]
This is a type: "pointer to char array of size a.
(char (*) [a])0x0
This is a cast: take that null pointer, cast it to the type "pointer to array [a] of char".
((char (*) [a])0x0)[b]
Take that pointer, imagine it points to an array, and fetch the b'th element of that array. Since array indexing is the same as pointer arithmetic, this will end up computing 0 + a * b.
&((char (*) [a])0x0)[b];
We had a reference to the b'th element of the "array". Now compute a pointer to that element. That pointer should literally have the value 0 + a * b.
(uintptr_t)&((char (*) [a])0x0)[b];
Finally, take that pointer and cast it to an integer type.
Now, with all of this said, it must be pointed out that this is a hack. Writing code to perform arithmetic on null pointers in this way is highly problematic. It might be almost-but-not-quite-legal; it might be legal-but-just-barely-legal. You could argue for hours about which side of the line the answer falls on.
In this case, of course, it's an academic argument, because no one would ever seriously propose doing multiplication this way.

This code invokes undefined behavior by performing pointer arithmetic on an invalid pointer. That being said, here's what it's attempting to do.
(char (*) [a])0x0 is casting the value 0 to a pointer to an array of size a of char, giving you a pointer to an object that takes up a bytes.
Then with &((char (*) [a])0x0)[b] it uses array indexing to get the b element this pointer points to and takes its address.
Also, because an expression of the type E1[E2] is exactly the same as *(E1 + E2), this means the prior expression is the same as &(*((char (*) [a])0x0) + b), and because & followed by * cancel out this is the same as ((char (*) [a])0x0) + b. So there's no dereferencing of an invalid pointer.
Because pointer arithmetic increments the value of a pointer by the offset times the element size, you now have a pointer whose numeric value is a*b. That value is then converted to an integer type and returned.
Where the undefined behavior comes into play is in the implicit + operator in the array indexing. Pointer arithmetic is only valid if the original pointer and the result of the addition both point to valid object (or one element past the end of an array of objects). Since 0 is not a valid address, this is UB.

Technically this is undefined behavior. But the intended functionality that this code might resolve assuming a naive compiler logic is as following.
((char (*) [a])0x0) - this takes an address 0x0 and is casting it to a pointer to array of a char elements, that is a pointer to an object of size a bytes.
Now, according to C pointer arithmetic any operation (addition/subtraction) with this pointer will be performed in the multiples of a.
Next, it is taking the b offset of this pointer. As we know, p[b] is equivalent to *(p + b) for any pointer p. In our case p is equal to 0x0 and is a pointer to an object of size a. Therefore p + b will have a numerical value of 0x0 + b * sizeof(*p) or 0x0 + a * b. Which is exactly a * b.

Related

Why is this code involving arrays and pointers behaving as it does?

I was asked what the output of the following code is:
int a[5] = { 1, 3, 5, 7, 9 };
int *p = (int *)(&a + 1);
printf("%d, %d", *(a + 1), *(p - 1));
3, 9
Error
3, 1
2, 1
The answer is NO.1
It is easy to get *(a+1) is 3.
But how about int *p = (int *)(&a + 1); and *(p - 1) ?
The answer to this could be either "1) 3,9" or "2) Error" (or more specifically undefined behavior) depending on how you read the C standard.
First, let's take this:
&a + 1
The & operator takes the address of the array a giving us an expression of type int(*)[5] i.e. a pointer to an array of int of size 5. Adding 1 to this treats the pointer as pointing to the first element of an array of int [5], with the resulting pointer pointing to just after a.
Also, even though &a points to a singular object (in this case an array of type int [5]) we can still add 1 to this address. This is valid because 1) a pointer to a singular object can be treated as a pointer to the first element of an array of size 1, and 2) a pointer may point to one element past the end of an array.
Section 6.5.6p7 of the C standard states the following regarding treating a pointer to an object as a pointer to the first element of an array of size 1:
For the purposes of these operators, a pointer to an object
that is not an element of an array behaves the same as a pointer
to the first element of an array of length one with the type of the
object as its element type.
And section 6.5.6p8 says the following regarding allowing a pointer to point to just past the end of an array:
When an expression that has integer type is added to or
subtracted from a pointer, the result has the type of the pointer
operand. If the pointer operand points to an element of an array
object, and the array is large enough, the result points to an element
offset from the original element such that the difference of the
subscripts of the resulting and original array elements equals the
integer expression. In other words, if the expression P points to the
i-th element of an array object, the expressions (P)+N
(equivalently, N+(P)) and (P)-N (where N has the value n) point to,
respectively, the i+n-th and i−n-th elements of the array object,
provided they exist. Moreover, if the expression P points to the
last element of an array object, the expression (P)+1 points one past
the last element of the array object, and if the expression Q
points one past the last element of an array object, the
expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to
elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an
overflow; otherwise, the behavior is undefined. If the result points
one past the last element of the array object, it shall not be used as
the operand of a unary * operator that is evaluated.
Now comes the questionable part, which is the cast:
(int *)(&a + 1)
This converts the pointer of type int(*)[5] to type int *. The intent here is to change the pointer which points to the end of the 1-element array of int [5] to the end of the 5-element array of int.
However the C standard isn't clear on whether this conversion and the subsequent operation on the result is allowed. It does allow conversion from one object type to another and back, assuming the pointer is properly aligned. While the alignment shouldn't be an issue, using this pointer is iffy.
So this pointer is assigned to p:
int *p = (int *)(&a + 1)
Which is then used as follows:
*(p - 1)
If we assume that p validly points to one element past the end of the array a, subtracting 1 from it results in a pointer to the last element of the array. The * operator then dereferences this pointer to the last element, yielding the value 9.
So if we assume that (int *)(&a + 1) results in a valid pointer, then the answer is 1) 3,9 otherwise the answer is 2) Error.
In the line
int *p = (int *)(&a + 1);
note that &a is being written, not a. This is important.
If simply a had been written, then the array would have decayed to a pointer to the first element, i.e. to &a[0]. However, since the expression &a was used instead, the result of this expression has the same value as if a or &a[0] had been used, but the type is different: The type is a pointer to an array of 5 int elements, instead of a pointer to a single int element.
According to the rules on pointer arithmetic, incrementing a pointer by 1 will increase the memory address by the size of the object that it is pointing to. Since the pointer is not pointing to a single element, but to an array of 5 elements, the memory address will be incremented by 5 * sizeof(int). Therefore, after incrementing the pointer, the value of (but not type of) the pointer will be equivalent to &a[5], i.e. one past the end of the array.
After casting this pointer to int * and assigning the result to p, the expression p is fully equivalent to &a[5] (both in value and in type).
Therefore, the expression *(p - 1) is equivalent to *(&a[5] - 1), which is equivalent to *(&a[4]), or simply a[4].
This:
&a + 1;
is taking the address of a, an array, and adding 1, which adds the size of one a, i.e. 5 integers. Then the indexing "backs down", one integer, ending up in the final element of a.
Normally whenever arrays are used in expressions, they "decay" into a pointer to the first element. There are a few exceptions to this rule and one such exception is the & operator.
&a therefore yields a pointer to the array of type int (*)[5]. Then &a + 1 is pointer arithmetic on such a type, meaning the pointer address is increased by the size of one int [5]. We end up pointing just beyond the array, but C actually allows us to do that as long as we don't de-reference that location.
Then the pointer is forced a type conversion to (int *) which we can do too - C allows pretty much any manner of wild pointer conversions as long as we don't de-reference or cause misalignment etc.
p - 1 does pointer arithmetic on type int and the actual type of data in the array is also int, so we are allowed to de-reference that location. We end up at the last item of the array.

How do variable types affect pointer arithmetic work in C?

I'm having trouble understanding pointer's arithmetic.
Let int B=0, *p=&B, **V=&p and sizeof(int)=4, sizeof(*int)=8
What does the instruction (*V)[1] do?
To me, what I see is that (*V)[1] is equivalent*(*V+1), so what should happen is, we dereference V (which is a pointer to a pointer to an int) and sum 1 to the content of that variable, which is an address. That variable is a pointer and we're assuming sizeof(*int)=8, so in theory we should sum 1 * sizeof(*int) (which is 8) to whatever address is stored in the pointer p to which the pointer V points.
The solution, however, says to sum 4 (1 + sizeof(int)). Is it wrong or is my thinking wrong?
The solution you reference is correct.
The expression *V has type int *, so it points to an array of 1 or more int. So because it points to an int, when pointer arithmetic happens the size of the datatype it point to (sizeof(int), i.e. 4) is multiplied by the given value (1). So if you were to print the values of *V and *V + 1 you would see that they differ by 4.
There is however a problem with (*V)[1], equivalently *(*V + 1). Since *V points to B, *V + 1 points one element past B. This is legal since a pointer can point to one element past the end of an array (or equivalently a single object which is treated as an array of size 1). What is not legal however is to dereference that pointer. Doing so invokes undefined behavior.
(*V)[1] is indeed equivalent to *(*V+1).
Since V is &p (by initialization), *V is p. So we have *(p+1).
Note that both *V and p have type int *. They point to an int, so p+1 points to “the next” int.
Since p points to B (by initialization), and B is a single int, p+1 points just past the end of B (where the “next int” would be if we had an array of int there instead of a single int).
This “just past the end of B” is allowed for a pointer, and it is the location your source refers to for the solution that (*V)[1] effectively adds four bytes to the location that *V points to.
However, while it is allowed to refer to one past the end of B, the C standard does not define the behavior of attempting to access an object there. (*V+1) is a defined pointer, but *(*V+1) is not a defined expression for an object at that location. Its behavior is not defined by the C standard.

Is `*((*(&array + 1)) - 1)` safe to use to get the last element of an automatic array?

Suppose I want to get the last element of an automatic array whose size is unknown. I know that I can make use of the sizeof operator to get the size of the array and get the last element accordingly.
Is using *((*(&array + 1)) - 1) safe?
Like:
char array[SOME_SIZE] = { ... };
printf("Last element = %c", *((*(&array + 1)) - 1));
int array[SOME_SIZE] = { ... };
printf("Last element = %d", *((*(&array + 1)) - 1));
etc
No, it is not.
&array is of type pointer to char[SOME_SIZE] (in the first example given). This means &array + 1 points to memory immediately past the end of array. Dereferencing that (as in (*(&array+1)) gives undefined behaviour.
No need to analyse further. Once there is any part of an expression that gives undefined behaviour, the whole expression does.
I don't think it is safe.
From the standard as #dasblinkenlight quoted in his answer (now removed) there is also something I would like to add:
C99 Section 6.5.6.8 -
[...]
if the expression P points to the last element of an array object, the expression (P)+1 points [...]
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
So as it says , we should not do this *(&array + 1) as it will go one past the last element of array and so * should not be used.
As also it is well known that dereferencing pointers pointing to an unauthorized memory location leads to undefined behaviour .
I believe it's undefined behavior for the reasons Peter mentions in his answer.
There is a huge debate going on about *(&array + 1). On the one hand, dereferencing &array + 1 seems to be legal because it's only changing the type from T (*)[] back to T [], but on the other hand, it's still a pointer to uninitialized, unused and unallocated memory.
My answer relies on the following:
C99 6.5.6.7 (Semantics of additive operators)
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
Since &array is not a pointer to an object that is an element of an array, then according to this, it means that the code is equivalent to:
char array_equiv[1][SOME_SIZE] = { ... };
/* ... */
printf("Last element = %c", *((*(&array_equiv[0] + 1)) - 1));
That is, &array is a pointer to an array of 10 chars, so it behaves the same as a pointer to the first element of an array of length 1 where each element is an array of 10 chars.
Now, that together with the clause that follows (already mentioned in other answers; this exact excerpt is blatantly stolen from ameyCU's answer):
C99 Section 6.5.6.8 -
[...]
if the expression P points to the last element of an array object, the expression (P)+1 points [...]
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
Makes it pretty clear that it is UB: it's equivalent to dereferencing a pointer that points one past the last element of array_equiv.
Yes, in real world, it probably works, as in reality the original code doesn't really dereference a memory location, it's mostly a type conversion from T (*)[] to T [], but I'm pretty sure that from a strict standard-compliance point of view, it is undefined behavior.
It is probably safe, but there are some caveats.
Suppose we have
T array[LEN];
Then &array is of type T(*)[LEN].
Next, &array + 1 is again of type T(*)[LEN], pointing just past the end of the original array.
Next, *(&array + 1) is of type T[LEN], which may be implicitly converted to T*, still pointing just past the end of the original array. (So we did NOT dereference an invalid memory location: the * operator is not evaluated).
Next, *(&array + 1) - 1 is of type T*, pointing at the last array location.
Finally, we dereference this (which is legitimate if the array length is not zero): *(*(&array + 1) - 1) gives the last array element, a value of type T.
Note that the only time we actually dereference a pointer is in this last step.
Now, the potential caveats.
First, *(&array + 1) formally appears like an attempt to dereference a pointer that points to an invalid memory location. But it really isn't. That's the nature of array pointers: this formal dereference only changes the type of the pointer, does not actually result in an attempt to retrieve value from the referenced location. That is, array is of type T[LEN] but it may be implicitly converted to type &T, pointing to the first element of the array; &array is a pointer to type T[LEN], pointing at the beginning of the array; *(&array+1) is again of type T[LEN] which may be implicitly converted to type &T. At no point is a pointer actually dereferenced.
Second, &array + 1 may in fact be an invalid address, but it really isn't: My C++11 reference manual tells me explicitly that "Taking a pointer to the element one beyond the end of an array is guaranteed to work", and a similar statement is also made in K&R, so I believe it has always been standard behavior.
Finally, in case of a zero-length array, the expression dereferences the memory location just before the array, which may be unallocated/invalid. But this issue would also arise if one used a more conventional approach using sizeof() without testing for nonzero length first.
In short, I do not believe there is anything undefined or implementation-dependent about this expression's behavior.
Imho that might work but is probably unwise. You should carefully review your sw design and ask yourself why you want the last entry of the array. Is the content of the array completely unknown to you or is it possible to define the structure in terms of c structs and unions. If that is the case stay away from complex pointer operations in a char array for example and define the data properly in you c code, in structs and unions where ever possible.
So instead of :
printf("Last element = %c", *((*(&array + 1)) - 1));
It could be :
printf("Checksum = %c", myStruct.MyUnion.Checksum);
This clarifies your code. The last letter in your array means nothing to a person not familiar with whats in this array. myStruct.myUnion.Checksum makes sense to anyone. Studying the myStruct structure could explain the whole data structure to anyone. Please use something like that if it can be declared in such a way. If you are in the rare situation you can not, study above answers, they make good sense i think
a)
If both the pointer operand and the result [of P + N] point to
elements of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
[...]
if the expression P points either to an element of an array
object or one past the last element of an array object, and the
expression Q points to the last element of the same array object, the
expression ((Q)+1)−(P) has the same value as ((Q)−(P))+1 and as
−((P)−((Q)+1)), and has the value zero if the expression P points one
past the last element of the array object, even though the expression
(Q)+1 does not point to an element of the array object.
This states that computations using array elements one past the last element is actually completely fine. As some people here have written that the use of non-existent objects for computations is already illegal, I thought I include that part.
Then we need to take care about this part:
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is
evaluated.
There is one important part that the other answers omitted and that is:
If the pointer operand points to an element of an array object
This is not the fact. The pointer operand we dereference is not a pointer to an element of an array object, it is a pointer to a pointer. So this whole clause is completely irrelevant. But, there is also stated:
For the purposes of these [additive] operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
What does this mean?
It means our pointer to a pointer is actually again a pointer to an array - of length[1]. And now we can close the loop, because as the first paragraph states, we are allowed to make calculations with one past the array, so we are allowed to make calculations with the array as if it would be an array of length[2]!
In a more graphical way:
ptr -> (ptr to int[10])[0] -> int[10]
-> (ptr to int[10])[1]
So, we are allowed to make calculations with (ptr to int[10])[1], even though it is technically outside the array of length[1].
b)
The steps that happen are:
array ptr of type int[SOME_SIZE] to the first element array
&array ptr to a ptr of type int[SOME_SIZE] to the first element of array
+ 1 ptr, one more than the ptr of type int[SOME_SIZE]) to the first element array, to a ptr of type int
This is NOT yet a pointer to int[SOME_SIZE+1], according to C99 Section 6.5.6.8. This is NOT yet ptr + SOME_SIZE + 1
* We dereference the pointer to the pointer. NOW, after the dereferencing, we have a pointer according to C99 Section 6.5.6.8, which is past the element of the array and which is not allowed to be dereferenced. This pointer is allowed to exist and we are allowed to use operators on it, except the unary * operator. But we don't use that one on that pointer yet.
-1 Now we subtract one from the ptr of type int to one after the last element of the array, letting ptr point to the last element of the array.
* dereferencing a ptr to int to the last element of the array, which is legal.
c)
And last, but not least:
If it would be illegal, then the offsetof macro would be illegal, too, which is defined as:
((size_t)(&((st *)0)->m))

Why the compiler is not showing error on expressions that use arrays like pointers?

I'm new in C programming and currently learning about array and strings. I'm quite confuse in this topic. Coming to my question-
Since an array (for ex- a[]={20,44,4,8}), the name in an expression decays into pointer constant,so whenever if i try to do pointer arithmetic for example- a=a+1 or anything like this the compiler shows error but when the same thing I write in printf() function it is showing the address of the first element rather than showing error. Why?
In an expression for example *(a+1)=2 first (a+1) will be evaluated and then * will dereference it. My question is that if a is a pointer constant then how it can point to any other memory location in an array and how this expression is perfectly legal?
I tried to search about this but couldn't get the accurate result.
Although an array name evaluates to a pointer in some expressions, your a = a+1 assignment tries to assign to an array, which is not allowed.
On the other hand, a+1 expression is allowed, and it evaluates to another pointer. When you pass this value to printf, the function happily prints it. Do not forget to cast the result to void* when you print:
printf("%p\n", (void*)(a+1));
if a is a pointer constant then how it can point to any other memory location in an array and how is *(a+1) expression perfectly legal?
For the same reason that 2+3, a combination of two constants, produces a value that is neither a 2 nor a 3. In your example, a+1 expression does not modify a. Instead, the expression uses it as a "starting point", computes a different value (which happens to be of type pointer), and leaves a unchanged.
The name of the array a is not quite the same as a pointer constant. It merely
acts like a pointer constant in some circumstances. In other circumstances it will
act quite differently; for example, sizeof(a) may have a much larger value
than sizeof(b) where b is truly a pointer.
This code is legal:
int a[] = {20,44,4,8};
int *b;
b = a;
b = b + 1;
because a is enough like a pointer that you can set b to point to the same
address but, unlike a, b really is a pointer and it can be modified.
The last line of code could just as well be:
b = a + 1;
because the right-hand side here is not trying to modify a; it is merely using
the address of the first element of a to compute a new address.
The expression *(a + 1) is effectively another way of writing a[1].
You know what will happen when you write a[1] = 2, right?
It will change what is stored in the second element of a.
(The first element is always a[0] whether you do anything with it or not.)
Storing a new value in a[1] doesn't change the location of the array a.
When array decays in to pointer, the resulting value is a rvalue. It's an value that cannot be assigned to.
So int[4] will become int*const, constant pointer to integer.
Q1:
Types in expression a = a + 1 are:
int[4] = int[4] + int
If we focus on addition first, array decays to pointer:
int[4] = int*const + int
int[4] = int*const // After addition
But now there is a problem:
int*const = int*const
In memory a is an array with 4 ints, and nothing more. There is no place where you could possibly store address with type int*. Compiler will show an error.
Q2:
Types in expression *(a+1)=2 are:
*(int[4] + int) = int
Again, array decays to pointer and addition happens:
*(int*const + int) = int
*(int*const) = int // int* is now equal to &a[1]
Dereferencing int*const is legal. While pointer is constant, value it points to is not:
int = int // Ok, equal types
Types are now perfectly compatible.

Add operation on pointer

int a[3];
int *j;
a[0]=90;
a[1]=91;
a[2]=92;
j=a;
printf("%d",*j);
printf("%d",&a[0])
printf("%d",&a[1]);
printf("%d",*(j+2));
here the pointer variable j is pointing to a[0],which is 90;and address of a[0] is -20 is on my machine. So j is holding -20.
And the address of a[1] is -18. So to get next variable I should use *(j+2). because j+2 will result in -18. but this is actually going on. To access a[1]. I have to use *(j+1). but j+1=-19. Why is j+1 resulting in -18 ?
Addresses are unsigned. You're printing them as if they were an int, but they're not an int. Use "%p" as the format specifier. That's how you print the address of a pointer.
Additionally, pointer arithmetic is different than the arithmetic you are used to. Internally, adding one to a pointer p increments the address by sizeof *p bytes, i.e., it increments the to the next object.
This is convenient as it saves the programmer from having to always use sizeof when performing arithmetic on a pointer (rarely do you actually want to increment by something other than sizeof *p. When you do, you cast to a char* first.)
pointer addition is not same as simple addition.
It depends on what type of variable the pointer is pointing.
In your case it's an int whose size is machine dependent (you can check of doing sizeof(int)).
So when adding a number to pointer like (j+i) it internally converts to (j+i*sizeof(datatype)) so when you type (j+2) the address is increased the 4 times (assuming int to be 2 bytes) which is not the intended result.
(j+1) will give you the right result (it's like saying point to the next element of int type of data)
Actually the pointer logic works on the basis of the pointer type since integer pointer type moves by size of integer on your machine it moves by the 2 bytes on your machine so p+1 = p+(sizeof(type of pointer))
*(x+y) is always exactly equivalent to x[y] (or y[x]). So to print a[1], you want *(j + 1) (or just j[1]). Note that, in a[1], a is converted to a pointer ... there's no difference between the way a is handled and the way j is handled here.

Resources