Applying strcpy to a pointer to array + some_offset - c

This is a language-lawyer question.
The description of strcpy function is given at 7.24.2.3(p2):
The strcpy function copies the string pointed to by s2 (including
the terminating null character) into the array pointed to by s1.
So consider the following code:
char test[8] = "123";
strcpy(test + 3, "4567");
printf("%s\n", test); //1234567
It works as expected, but I'm confused about the object pointed to by test + 3. It is clear that object pointed to by test has declared type char[8]. But as far as I can see the Standard does not explicitly defined sort of this:
"If we have an array of n elements than a pointer to i < nth element can be considered a pointer to the first element of an array of n - i elements".
Since the function strcpy requires its first operand to be an array can we pedantically speaking apply pointer arithmetic as I showed above?

Of course s1 does not point to an array, it points to a char. But the term usage comes from the 7.1.4p1:
[...] If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid. [...]
For strcpy(test + 3, "4567"); the accesses test[3 + 0 ... 4] shall be valid, which is the case, among others possibilities, if test is an array of at least 8 characters.

Yes, because even though the semantics of the function strcpy only make sense when dealing with arrays instead of pointers, there is no way for strcpy (or any other function) to tell whether it was passed an array or a pointer to an array, since when an array is passed as an argument to a function, it decays to a pointer to its first element.

Related

Is `*((*(&array + 1)) - 1)` safe to use to get the last element of an automatic array?

Suppose I want to get the last element of an automatic array whose size is unknown. I know that I can make use of the sizeof operator to get the size of the array and get the last element accordingly.
Is using *((*(&array + 1)) - 1) safe?
Like:
char array[SOME_SIZE] = { ... };
printf("Last element = %c", *((*(&array + 1)) - 1));
int array[SOME_SIZE] = { ... };
printf("Last element = %d", *((*(&array + 1)) - 1));
etc
No, it is not.
&array is of type pointer to char[SOME_SIZE] (in the first example given). This means &array + 1 points to memory immediately past the end of array. Dereferencing that (as in (*(&array+1)) gives undefined behaviour.
No need to analyse further. Once there is any part of an expression that gives undefined behaviour, the whole expression does.
I don't think it is safe.
From the standard as #dasblinkenlight quoted in his answer (now removed) there is also something I would like to add:
C99 Section 6.5.6.8 -
[...]
if the expression P points to the last element of an array object, the expression (P)+1 points [...]
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
So as it says , we should not do this *(&array + 1) as it will go one past the last element of array and so * should not be used.
As also it is well known that dereferencing pointers pointing to an unauthorized memory location leads to undefined behaviour .
I believe it's undefined behavior for the reasons Peter mentions in his answer.
There is a huge debate going on about *(&array + 1). On the one hand, dereferencing &array + 1 seems to be legal because it's only changing the type from T (*)[] back to T [], but on the other hand, it's still a pointer to uninitialized, unused and unallocated memory.
My answer relies on the following:
C99 6.5.6.7 (Semantics of additive operators)
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
Since &array is not a pointer to an object that is an element of an array, then according to this, it means that the code is equivalent to:
char array_equiv[1][SOME_SIZE] = { ... };
/* ... */
printf("Last element = %c", *((*(&array_equiv[0] + 1)) - 1));
That is, &array is a pointer to an array of 10 chars, so it behaves the same as a pointer to the first element of an array of length 1 where each element is an array of 10 chars.
Now, that together with the clause that follows (already mentioned in other answers; this exact excerpt is blatantly stolen from ameyCU's answer):
C99 Section 6.5.6.8 -
[...]
if the expression P points to the last element of an array object, the expression (P)+1 points [...]
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
Makes it pretty clear that it is UB: it's equivalent to dereferencing a pointer that points one past the last element of array_equiv.
Yes, in real world, it probably works, as in reality the original code doesn't really dereference a memory location, it's mostly a type conversion from T (*)[] to T [], but I'm pretty sure that from a strict standard-compliance point of view, it is undefined behavior.
It is probably safe, but there are some caveats.
Suppose we have
T array[LEN];
Then &array is of type T(*)[LEN].
Next, &array + 1 is again of type T(*)[LEN], pointing just past the end of the original array.
Next, *(&array + 1) is of type T[LEN], which may be implicitly converted to T*, still pointing just past the end of the original array. (So we did NOT dereference an invalid memory location: the * operator is not evaluated).
Next, *(&array + 1) - 1 is of type T*, pointing at the last array location.
Finally, we dereference this (which is legitimate if the array length is not zero): *(*(&array + 1) - 1) gives the last array element, a value of type T.
Note that the only time we actually dereference a pointer is in this last step.
Now, the potential caveats.
First, *(&array + 1) formally appears like an attempt to dereference a pointer that points to an invalid memory location. But it really isn't. That's the nature of array pointers: this formal dereference only changes the type of the pointer, does not actually result in an attempt to retrieve value from the referenced location. That is, array is of type T[LEN] but it may be implicitly converted to type &T, pointing to the first element of the array; &array is a pointer to type T[LEN], pointing at the beginning of the array; *(&array+1) is again of type T[LEN] which may be implicitly converted to type &T. At no point is a pointer actually dereferenced.
Second, &array + 1 may in fact be an invalid address, but it really isn't: My C++11 reference manual tells me explicitly that "Taking a pointer to the element one beyond the end of an array is guaranteed to work", and a similar statement is also made in K&R, so I believe it has always been standard behavior.
Finally, in case of a zero-length array, the expression dereferences the memory location just before the array, which may be unallocated/invalid. But this issue would also arise if one used a more conventional approach using sizeof() without testing for nonzero length first.
In short, I do not believe there is anything undefined or implementation-dependent about this expression's behavior.
Imho that might work but is probably unwise. You should carefully review your sw design and ask yourself why you want the last entry of the array. Is the content of the array completely unknown to you or is it possible to define the structure in terms of c structs and unions. If that is the case stay away from complex pointer operations in a char array for example and define the data properly in you c code, in structs and unions where ever possible.
So instead of :
printf("Last element = %c", *((*(&array + 1)) - 1));
It could be :
printf("Checksum = %c", myStruct.MyUnion.Checksum);
This clarifies your code. The last letter in your array means nothing to a person not familiar with whats in this array. myStruct.myUnion.Checksum makes sense to anyone. Studying the myStruct structure could explain the whole data structure to anyone. Please use something like that if it can be declared in such a way. If you are in the rare situation you can not, study above answers, they make good sense i think
a)
If both the pointer operand and the result [of P + N] point to
elements of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
[...]
if the expression P points either to an element of an array
object or one past the last element of an array object, and the
expression Q points to the last element of the same array object, the
expression ((Q)+1)−(P) has the same value as ((Q)−(P))+1 and as
−((P)−((Q)+1)), and has the value zero if the expression P points one
past the last element of the array object, even though the expression
(Q)+1 does not point to an element of the array object.
This states that computations using array elements one past the last element is actually completely fine. As some people here have written that the use of non-existent objects for computations is already illegal, I thought I include that part.
Then we need to take care about this part:
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is
evaluated.
There is one important part that the other answers omitted and that is:
If the pointer operand points to an element of an array object
This is not the fact. The pointer operand we dereference is not a pointer to an element of an array object, it is a pointer to a pointer. So this whole clause is completely irrelevant. But, there is also stated:
For the purposes of these [additive] operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
What does this mean?
It means our pointer to a pointer is actually again a pointer to an array - of length[1]. And now we can close the loop, because as the first paragraph states, we are allowed to make calculations with one past the array, so we are allowed to make calculations with the array as if it would be an array of length[2]!
In a more graphical way:
ptr -> (ptr to int[10])[0] -> int[10]
-> (ptr to int[10])[1]
So, we are allowed to make calculations with (ptr to int[10])[1], even though it is technically outside the array of length[1].
b)
The steps that happen are:
array ptr of type int[SOME_SIZE] to the first element array
&array ptr to a ptr of type int[SOME_SIZE] to the first element of array
+ 1 ptr, one more than the ptr of type int[SOME_SIZE]) to the first element array, to a ptr of type int
This is NOT yet a pointer to int[SOME_SIZE+1], according to C99 Section 6.5.6.8. This is NOT yet ptr + SOME_SIZE + 1
* We dereference the pointer to the pointer. NOW, after the dereferencing, we have a pointer according to C99 Section 6.5.6.8, which is past the element of the array and which is not allowed to be dereferenced. This pointer is allowed to exist and we are allowed to use operators on it, except the unary * operator. But we don't use that one on that pointer yet.
-1 Now we subtract one from the ptr of type int to one after the last element of the array, letting ptr point to the last element of the array.
* dereferencing a ptr to int to the last element of the array, which is legal.
c)
And last, but not least:
If it would be illegal, then the offsetof macro would be illegal, too, which is defined as:
((size_t)(&((st *)0)->m))

Why is setting an array of characters to NULL illegal? Passing to function changes behavior

The name of an array is a synonym for the address of the first element of the array, so why can't this address be set to NULL? Is it a language rule to prevent a memory leak?
Also, when we pass an array to a function, it's behavior changes and it becomes possible to set it to NULL.
I don't understand why this occurs. I know it has something to do with pointers, but I just can't wrap my mind around it.
Example:
void some_function(char string[]);
int main()
{
char string[] = "Some string!";
some_function(string);
printf("%s\n", string);
return 0 ;
}
void some_function(char string[])
{
string = NULL;
}
Output: Some string!
I read that when an array is passed into a function, what's actually passed are pointers to each element, but wouldn't the name of the array itself still be a synonym for the address of the first element? Why is setting it to NULL here even allowed, but not in the main function?
Is it at all possible to set an array to NULL?
An array is not a pointer - the symbol string in your case has attributes of address and size whereas a pointer has only an address attribute. Because an array has an address it can be converted to or interpreted as a pointer, and the language supports this implicitly in a number of cases.
When interpreted as a pointer you should consider its type to be char* const - i.e. a constant pointer to variable data, so the address cannot be changed.
In the case of passing the array to a function, you have to understand that arrays are not first class data types in C, and that they are passed by reference (i.e. a pointer) - loosing the size information. The pointer passed to the function is not the array, but a pointer to the array - it is variable independent of the original array.
You can illustrate what is effectively happening without the added confusion of function call semantics by declaring:
char string[] = "Some string!";
char* pstring = string ;
then doing:
pstring = NULL ;
Critically, the original array data cannot just "disappear" while it is in scope (or at all if it were static), the content of the array is the variable, whereas a pointer is a variable that refers to data. A pointer implements indirection, and array does not. When an array is passed to a function, indirection occurs and a pointer to the array is passed rather than a copy of the array.
Incidentally, to pass an array (which is not a first class data type) by copy to a function, you must wrap int within a struct (structs in C are first class data types). This is largely down to the original design of C under constraints of systems with limited memory resources and the need to to maintain compatibility with early implementations and large bodies of legacy code.
So the fact that you cannot assign a pointer to an array is hardly the surprising part - because to do so makes little sense. What is surprising perhaps is the semantics of "passing an array" and the fact that an array is not a first class data type; leading perhaps to your confusion on the matter.
You can't rebind an array variable. An array is not a pointer. True, at a low level they are approximately similar, except pointers have no associated dimension / rank information.
You cant assign NULL to the actual array (same scope), but you can assign to a parameter since C treats it like a pointer.
The standard says:
7 A declaration of a parameter as ‘‘array of type’’ shall be adjusted
to ‘‘qualified pointer to type’’,
So in the function the NULL assignment is legal.

Accessing arrays in an array of pointers

Let's say I have an array of pointers in C. For instance:
char** strings
Each pointers in the array points to a string of a different length.
If I will do, for example: strings + 2, will I get to the third string, although the lengths may differ?
Yes, you will (assuming that the array has been filled correctly). Imagine the double pointer situation as a table. You then have the following, where each string is at a completely different memory address. Please note that all addresses have been made up, and probably won't be real in any system.
strings[0] = 0x1000000
strings[1] = 0xF0;
...
strings[n] = 0x5607;
0x1000000 -> "Hello"
0xF0 -> "World"
Note here that none of the actual text is stored in the strings. The storage at those addresses will contain the actual text though.
For this reason, strings + 2 will add two to the strings pointer, which will yield strings[2], which will yield a memory address, which can then be used to access the string.
strings + 2 is the address of the 3rd element of the buffer pointed to by string.
*(strings + 2) or strings[2] is the 3rd element which is again a pointer to a buffer of characters.
i think you are looking to access third element through expression
strings[2];
but this will not be the case because look at the type of expression string[2]
Type is char *
As according to the standards
A 'n' element array of type 't' will be decayed into pointer of type __t__.With the exception when expression is an operand to '&' operator and 'sizeof' operator.
so strings[2] is equivalent to *(strings + 2) so it will print the contents of the pointer to pointer at third location,which is the contents of a pointer i.e an address.
But
strings+2;
whose type is char ** will print the 3 rd location's address,i.e,address of the 3rd element of array of pointer, whose base address is stored in **string.
But in your question you have not shown any assignment to the char ** strings and i am answering by assuming it to be initialised with particular array of pointers.
According to your question it is silly to do
*(strings + 2)
As it is not initialised.

Pass an array to function using pointer in C?

First I declare an array a with 10 elements. Then I call the function bubbleSort
bubbleSort( a, 10);
where bubbleSort is a function declared as
void bubbleSort(int* const array, const int size)
My question is if "array" is a pointer- which means it stored the address of array a (array= &a [0]) then how can we understand these terms array[1], array[2], array[3]... in the function bubbleSort?
It is the bubble sort program and this part is very confusing for me.
array[1] means, by definition in the C standard, *(array+1). So, if array is a pointer, this expression adds one element to the pointer, then uses the result to access the pointed-to object.
When a is an array, you may be used to thinking of a[0], a[1], a[2], and so on as elements of the array. But they actually go through the same process as with the pointer above, with one extra step. When the compiler sees a[1] and a is an array, the compiler first converts the array into a pointer to its first element. This is a rule in the C standard. So a[1] is actually (&a[0])[1]. Then the definition above applies: (&a[0])[1] is *(&a[0] + 1), so it means “Take the address of a[0], add one element, and access the object the result points to.”
Thus, a[1] in the calling code and array[1] in the called code have the same result, even though one starts with an array and the other uses a pointer. Both use the address of the first element of the array, add one element, and access the object at the resulting address.
C defines operations of addition and subtraction of integers and pointers, collectively called pointer arithmetics. The language specification says that adding N to a pointer is equivalent to advancing the pointer by N units of memory equal to the size of an object pointed to by the pointer. For example, adding ten to an int pointer is the same as advancing it by ten sizes of int; adding ten to a double pointer is equivalent to advancing the pointer by ten sizes of double, and so on.
Next, the language defines array subscript operations in terms of pointer arithmetics: when you write array[index], the language treats it as an equivalent of *((&array[0])+index).
At this point, the language has everything necessary to pass arrays as pointers: take &array[0], pass it to the function, and let the function use array subscript operator on the pointer. The effect is the same as if the array itself has been passed, except the size of the array is no longer available. The structure of your API indirectly acknowledges that by passing the size of the array as a separate parameter.
You have an array of int, identified by the address of its first element.
array[1] Is equivalent to *(array + 1) which mean "The value of what is pointed by array + the size of one element, which is known as int because you prototyped it as int *"
When you declare a to be an array of size 10, the c program stores the address of a[0] in a and since the memory is allocated continuously therefore you can access the subsequent integers by using a[2], a[4] etc. Now when you copy a to array it is actually the address that gets copied and therefore you can access the integers using array[0], array[1] etc.

question regarding pointer in c language

char *sample = "String Value";
&sample is a pointer to the pointer of "String Value"
is the above statement right?
If the above statement right, what is the equivalent of &sample if my declaration is
char sample[] = "String Value"
In the first one, there are two objects being created.
One is a char * (pointer-to-char) called sample, and the other is an unnamed array of 13 chars containing the characters of the string. In this case, &sample gives the address of the object sample, which is the address of a pointer-to-char - so, a pointer-to-pointer-to-char.
In the second example, there's only one object being created; an array of 13 chars called sample, initialised with the characters of the string. In this case, &sample gives the address of the object sample - so, a pointer-to-array-of-13-chars.
In the second example, there is no "equivalent" to &sample in the first example, in the sense of a pointer-to-pointer-to-char value. This is because there is no pointer-to-char value to take the address of. There is only the array.
While pointers provide enormous power and flexibility to the programmers, they may use cause manufactures if it not properly handled. Consider the following precaustions using pointers to prevent errors. We should make sure that we know where each pointer is pointing in a program. Here are some general observations and common errors that might be useful to remember. *ptr++, *p[],(ptr).member
In the first part &sample will return the address of 'sample' pointer created and in the second case the starting address of the string created as object.
In C arrays and pointers are more or less interchangable. You can treat an array name like it is a pointer, and a pointer like it is an array name.
If you take the address of (&) of a pointer, of course you get a pointer to a pointer.
&sample is the address of the pointer that points to "String Value".
For the second example, since an array name that is not followed by a subscript is interpreted as the pointer to the initial element of the array, which means
sample
and
&sample[0]
are the same, therefore &sample is also the address of the pointer that points to the string.

Resources