Difference between char array and pointer to char array - arrays

While reviewing my code I realized I had placed an extra & while passing a char array to strcpy and missed the resulting warning; regardless, everything worked as expected. I then reproduced the behavior in this example:
#include <string.h>
#include <stdio.h>
void main() {
char test1[32] = {0};
char test2[32] = {0};
strcpy(test1, "Test 1\n");
strcpy(&test2, "Test 2\n");
printf(test1);
printf(test2);
printf("%i %i\n", test2, &test2);
}
Here I copy a string to the address of test2 and the compiler complains accordingly:
main.c: In function ‘main’:
main.c:9:12: warning: passing argument 1 of ‘strcpy’ from incompatible pointer type [-Wincompatible-pointer-types]
9 | strcpy(&test2, "Test 2\n");
| ^~~~~~
| |
| char (*)[32]
In file included from main.c:1:
/usr/include/string.h:125:39: note: expected ‘char * restrict’ but argument is of type ‘char (*)[32]’
125 | extern char *strcpy (char *__restrict __dest, const char *__restrict __src)
| ~~~~~~~~~~~~~~~~~^~~~~~
However the code is still compiled and the result seems to ignore the second level of indirection. Even when printing the address &test2 it is the same as simply test2.
./a.out
Test 1
Test 2
-1990876288 -1990876288
I must admit this is a part of the C language that completely escapes me. Why is the & operand seemingly ignored when targeting an array?

The first byte of the first element in the array is at the same place as the first byte of the array, because they are the same byte.
Most C implementations use the memory address, or some representation of it, of the first byte of an object as a pointer to the object. The array contains its elements, and there is no padding: The first element of the array starts where the array starts. So the first byte in the first element is the first byte in the array. So they have the same memory address.
There is a rule in C that converting a pointer to an object to a char * produces a pointer to the first byte of an object (C 2018 6.3.2.3 7). So, given an array a, (char *) &a[0] is a pointer to the first byte of the first element, and (char *) &a is a pointer to the first byte of the array. These are the same byte, so (char *) &a[0] == (char *) &a.
However, &a[0] and &a have different types. If you attempt to compare them directly with &a[0] == &a, the compiler should issue a warning that the types do not match.
If you pass &a as an argument to a routine that expects &a[0], it will often work in most modern C implementations because they use plain memory addresses as pointers, so &a is represented with the same bits (a memory address) as &a[0], so the receiving routine gets the value it expected even though you passed a pointer of the wrong type. However, the behavior of your program will not be defined by the C standard, since you have violated the rules. This was more of a problem in older C implementations when memory models were not simple flat address spaces, and different types of pointers may have had different representations.

Your code works in this case because, once the pointer argument (with or without the extra &) gets to the strcpy function, it is interpreted as a simple char* value. Thus, any pointer arithmetic (such as the likely increments) performed in that function will be correct.
However, there are cases where using a pointer-to-char and pointer-to-array-of-char will yield significantly different results. Pointer arithmetic is such a case: if p is a char* variable, then ++p will add the size of a char (i.e. 1) to the address stored in p; however, if q is an array of char* pointers, then ++q will add the size of a pointer to the address in q. And, if r is the address of an array of character strings, then ++r will add the size of the entire array to the address stored in r.
So, it's good that the compiler warns you about that extra &. Be very careful about addressing (no pun intended) such issues, if ever your compiler warns you about them.

Why is the & operand seemingly ignored when targeting an array?
The conversion of char (*)[32] to char * is UB.
Is is not ignored by the compiler, hence the warning.
The compiler emitted code did convert the pointer from one type to the other in a common fashion resulting in acceptable behavior. Still remains UB.
Best if the programmer does not ignore the warning.

Related

How do I force a warning from using an array of wrong size when passed to function?

Let's say you have a function taking a string as an argument:
void foo(char *arg);
If we know for certain that the array (not to be confused with string length, thanks chux) will always have a certain size, let's say 8, then we can instead do:
void bar(char (*arg)[8]);
and then call it like this:
char str[8] = "Hello";
bar(&str);
We need to add the & for this to work properly, but the above code will emit a warning if you pass an array of wrong size or type, which is exactly what I want to achieve. But we will obviously need to modify the body a bit. So my question is simply if this wrapper technique would work:
void bar(char (*arg)[8]) {
char *tmp = (char*) arg;
foo(tmp);
}
What I'm trying to achieve here is that warnings should be emitted if called with an array of wrong size. Is the above solution safe? Is it safe to cast pointer to array of char to pointer to char?
I tried it, and it works, and emits no warnings with -Wall -Wextra -pedantic. And as soon as I change the size of str I get:
<source>: In function 'main':
<source>:18:9: warning: passing argument 1 of 'bar' from incompatible pointer type [-Wincompatible-pointer-types]
18 | bar(&str);
| ^~~~
| |
| char (*)[9]
<source>:9:17: note: expected 'char (*)[8]' but argument is of type 'char (*)[9]'
9 | void bar(char (*arg)[8]) {
| ~~~~~~~^~~~~~~
which is exactly what I want. But is it safe, or is it UB? I would like to do this, not only via a wrapper, but also by rewriting the original function, like
void foo(char (*argaux)[8]) {
char *arg = *argaux;
// Copy body of original foo
I know that I can achieve basically the same thing using structs, but I wanted to avoid that.
Runnable code: https://godbolt.org/z/GnaP5ceMr
char *tmp = (char*) arg; is wrong, these are not compatible pointer types. You can fix this easily though:
char *tmp = *arg;
*arg gives a char[8] which then decays into a pointer to its first element. This is safe and well-defined. And yes, pointers have much stronger "typing" in C than pass-by-value, so the compiler will recognize if an array of wrong size is passed.
Please note however that this leads to other problems: you can no longer have const correctness.
See Const correctness for array pointers?
This is not safe:
char *tmp = (char*) arg;
Because you're attempting to convert a char (*)[8] to a char *. While you might get away with it since a pointer to an array will (at least on x86-64) have the same numeric value as a pointer to the first member of an array, the standard doesn't guarantee that it will work. You would first need to dereference the parameter:
char *tmp = *arg;
In theory you should be able to do this:
void foo(char arg[static 8]);
This means that arg must be an array of at least that size.
The description of this syntax is in section 6.7.6.3p7 of the C standard:
A declaration of a parameter as ‘‘array of type’’ shall be adjusted
to ‘‘qualified pointer to type’’, where the type qualifiers (if
any) are those specified within the [ and ] of the array
type derivation. If the keyword static also appears within the
[ and ] of the array type derivation, then for each call
to the function, the value of the corresponding actual argument
shall provide access to the first element of an array with at least as
many elements as specified by the size expression.
However, most implementations don't enforce this restriction and it doesn't prevent you from passing an array larger than expected.

Pointer to int == Pointer to char (somewhat)?

In this code given below , i have declared a pointer to int and we all know that memcpy returns a void pointer to destination string , so if ptr is a pointer to int then why printf("%s",ptr); is totally valid , ptr is not a pointer to char after all.
#include <stdio.h>
#include <string.h>
//Compiler version gcc 6.3.0
int main()
{
char a1[20] ={0} , a2[20] ={0};
int *ptr;
fgets(a1,20,stdin);
fgets(a2,20,stdin);
ptr = memcpy(a1,a2,strlen(a2)-1);
printf("%s \n",ptr);
if(ptr)
printf("%s",a1);
return 0;
}
First consider ptr = memcpy(a1,a2,strlen(a2)-1);. memcpy is declared as void *memcpy(void * restrict, const void * restrict, size_t), so it accepts the a1 and a2 passed to it because pointers to any unqualified object type may be converted to void * or to const void *. (Pointers to object types qualified with const may also converted to const void *.) This follows from the rules for function calls in C 2018 6.5.2.2 7 (arguments are converted to the parameter types as if by assignment) and 6.5.16 1 (one operand is a possibly-qualified void * and the left has all the qualifiers of the right) and 6.5.16 2 (the right operand is converted to the type of the left).
Then memcpy returns a void * that is its first argument (after conversion to void *), and we attempt to assign this to ptr. This satisfies the constraints of the assignment (one of the operands is a void *), so it converts the pointer to the type of ptr, which is int *. This is governed by 6.3.2.3 7:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer…
Since a1 is a char array with no alignment requested, it could have any alignment. It might not be suitable for an int. If so, then the C standard does not define the behavior of the program, per the above.
If a1 happens to be suitably aligned for an int or the C implementation successfully converts it anyway, we go on to printf("%s \n",ptr);.
printf is declared as int printf(const char * restrict, ...). For arguments corresponding to ..., there is no parameter type to convert to. Instead, the default argument promotions are performed. These affect integer and float arguments but not pointer arguments. So ptr is passed to printf unchanged, as an int *.
For a %s conversion, the printf rules in 7.21.6.1 8 say “the argument shall be a pointer to the initial element of an array of character type.” While ptr is pointing to the same place in memory as the initial element, it is a pointer to an int, not a pointer to the initial element. Therefore, it is the wrong type of argument.
7.21.6.1 9 says “… If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.” Therefore, the C standard does not define the behavior of this program.
In many C implementations, pointers are simple addresses in memory, int * and char * have the same representation, and the compiler will tolerate passing an int * for a %s conversion. In this case, printf receives the address it is expecting and will print the string in a1. That is why you observed the result you did. The C standard does not require this behavior. Because printf is part of the standard C library, the C standard permits a compiler to treat it specially when it is called with external linkage. The compiler could, hypothetically, treat the argument as having the correct type (even though it does not) and change the printf call into a loop that used ptr as if it were a char *. I am not aware of any compilers that would generate undesired code in this case, but the point is the C standard does not prohibit it.
why printf("%s",ptr); is totally valid
It isn’t - it may work as expected, but it isn’t guaranteed to. By passing an argument of the wrong type to printf, you’ve invoked undefined behavior, which simply means the compiler isn’t required to handle the situation in any particular way. You may get the expected output, you may get garbage output, you may get a runtime error, you may corrupt the state of your system, you may open a black hole to the other side of the universe.

Why do I get warnings when I try to assign the address of a variable to a pointer that was declared to point to a variable of a different type?

Take a look at the following program. What I don't understand is why do I have to cast the address of the variable x to char* when it actually would be absolutely useless if you think about it for a second. All I really need is only the address of the variable and all the necessary type information is already in place provided by the declaration statement char* ptr.
#include <stdio.h>
int main(void) {
int x = 0x01020309;
char* ptr = &x; /* The GCC compiler is going to complain here. It will
say the following: "warning: initialization from
incompatible pointer type [enabled by default]". I
need to use the cast operator (char*) to make the
compiler happy. But why? */
/* char* ptr = (char*) &x; */ /* this will make the compiler happy */
printf("%d\n", *ptr); /* Will print 9 on a little-endian machine */
return 0;
}
The C Standard, 6.2.5 Types, paragraph 28 states:
A pointer to void shall have the same representation and
alignment requirements as a pointer to a character type.
Similarly, pointers to qualified or unqualified versions of
compatible types shall have the same representation and
alignment requirements. All pointers to structure types shall have
the same representation and alignment requirements as each other.
All pointers to union types shall have the same
representation and alignment requirements as each other.
Pointers to other types need not have the same representation or alignment requirements.
Since different types of pointers can have differing implementations or constraints, you can't assume it's safe to convert from one type to another.
For example:
char a;
int *p = &a
If the implementation has an alignment restriction on int, but not on char, that would result in a program that could fail to run.
This is because pointers of different types point to blocks of memory of different sizes even if they point to the same location.
&x is of type int* which tells the compiler the number of bytes (depending on sizeof(int)) to read when getting data.
Printing *(&x) will return the original value you entered for x
Now if you just do char* ptr = &x; the compiler assigns the address in &x to your new pointer (it can as they are both pointers) but it warns you that you are changing the size of the block of memory being addressed as a char is only 1 byte. If you cast it you are telling the compiler that this is what you intend.
Printing *(ptr) will return only the first byte of the value of x.
You are correct that it makes no practical difference. The warning is there to inform you that there might be something fishy with that assignment.
C has fairly strong type-checking, so most compilers will issue a warning when the types are not compatible.
You can get rid of the warning by adding an explicit cast (char*), which is you saying:
I know what I'm doing, I want to assign this value to my char* pointer even if the types don't match.
Its just simple as you assign integer type to character. similarly you are trying to assign integer type pointer to character type pointer.
Now why is so because this is how c works, if you increment a character pointer it will give you one byte next address and incrementing integer pointer will give you 2 byte next address.
According to your code, x is of type int. So the pointer that points to x should be of type int *. Compiler gives such error because you use a pointer which is not int *.
So make your pointer either int *, or void * then you don't need cast.

Getting warning when using `addres of` operator with array as an initializer to a pointer

I am learning C and I am having some troubles to truly understand arrays and pointers in C, after a lot of research (reading the K&R book (still reading it), reading some articles and some answers here on SO) and taking tons of notes along the way, I've decided to make a little test / program to prove my understanding and to clear some doubts, but I'm yet confused by the warning I am getting using this code:
#include <stdio.h>
int main(int argc, char **argv) {
char a[3] = {1, 2, 3};
char *p = &a;
printf("%p\n", a); /* Memory address of the first element of the array */
printf("%p\n", &a); /* Memory address of the first element of the array */
printf("%d\n", *a); /* 1 */
}
Output compiling with GNU GCC v4.8.3:
warning: initialization from incompatible pointer type
char *p = &a;
^
0x7fffe526d090
0x7fffe526d090
1
Output compiling with VC++:
warning C4047: 'initializing' : 'char *' differs in levels of indirection from 'char (*)[3]'
00000000002EFBE0
00000000002EFBE0
1
Why am I getting this warning if &a and a are supposed to have the same value and compiling without warnings when using just a as an initializer for p?
&a is the memory address of the entire array. It has the same value as the address of the first element (since the whole array starts with the first element), but its type is a "pointer to array" and not "pointer to element". Hence the warning.
You are trying to initialize a char* with a pointer to a char[3], which means a char(*)[3].
Unsurprisingly, the compiler complains about the mismatch.
What you wanted to do is take advantage of array-decay (an array-name decays in most contexts to a pointer to its first element):
char* p = a;
As an aside, %p is only for void*, though that mismatch, despite being officially Undefined Behavior, is probably harmless.

Why typecasting between char *ptr to int and int *ptr to char works completely fine? when they are designed to point to specific datatyped variables

I have just started learning pointers. I have some questions regarding pointers typecasting. Consider below program.
int main(){
int a = 0xff01;
char *s = &a;
char *t = (int *) &a;
printf("%x",*(int *)s);
printf(" %x",*(int *)t);
return 0;
}
The statement char *s = &a gives
warning: incompatible pointer conversion type.
But noticed that the two printf() statements works fine they give me the right output. The question is
char *t , char *s both are pointers to character type.
Why does 'C' compilers lets me to assign integer variable to char *p ? why dont they raise an error and restrict the programmer?
We have int *ptr to point to integer variables, then why do they still allow programmer to make char *ptr point to integer variable?
// Another sample code
char s = 0x02;
int *ptr = (char *)&s;
printf("%x",*(char *)ptr); // prints the right output
Why does an int *ptr made point to character type? it works. why compiler dont restrict me?
I really think this leads me to confusion. If the pointer types are interchangeable with a typecast then what is the point to have two different pointers char *ptr , int *ptr ?
when we could retrieve values using (int *) or (char *).
All pointers are of same size 4bytes(on 32bite machine). Then one could use void pointer.
Yes people told me, void pointers always needs typecasting when retrieving values from memory. When we know the type of variable we go for that specific pointer which eliminates the use of casting.
int a = 0x04;
int *ptr = &a;
void *p = &a;
printf("%x",*ptr); // does not require typecasting.
printf("%x",*(int *)p); // requires typecasting.
Yes, I have read that back in old days char *ptrs played role of void pointers. Is this one good reason? why still compilers support typecasting between pointers? Any help is greatly appreciated.
Compiling with GCC 4.9.1 on Mac OS X 10.9.5, using this mildly modified version of your code (different definition of main() so it compiles with my preferred compiler options, and included <stdio.h> which I assume was omitted for brevity in the question — nothing critical) in file ptr.c:
#include <stdio.h>
int main(void)
{
int a = 0xff01;
char *s = &a;
char *t = (int *) &a;
printf("%x",*(int *)s);
printf(" %x",*(int *)t);
return 0;
}
I get the compilation errors:
$ gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
-Wold-style-definition -Werror ptr.c -o ptr
ptr.c: In function ‘main’:
ptr.c:6:15: error: initialization from incompatible pointer type [-Werror]
char *s = &a;
^
ptr.c:7:14: error: initialization from incompatible pointer type [-Werror]
char *t = (int *) &a;
^
cc1: all warnings being treated as errors
$
So, both assignments are the source of a warning; my compilation options turn that into an error.
All pointers other than void * are typed; they point to an object of a particular type. Void pointers don't point to any type of object and must be cast to a type before they can be dereferenced.
In particular, char * and int * point to different types of data, and even when they hold the same address, they are not the same pointer. Under normal circumstances (most systems, most compilers — but there are probably exceptions if you work hard enough, but you're unlikely to be running into one of them)…as I was saying, under normal circumstances, the types char * and int * are not compatible because they point to different types.
Given:
int data = 0xFF01;
int *ip = &data;
char *cp = (char *)&data;
the code would compile without complaint. The int data line is clearly unexceptional (unless you happen to have 16-bit int types — but I will assume 32-bit systems). The int *ip line assigns the address of data to ip; that is assigning a pointer to int to a pointer to int, so no cast is necessary.
The char *cp line forces the compiler's hand to treat the address of data as a char pointer. On most modern systems, the value in cp is the same as the value in ip. On the system I learned C on (ICL Perq), the value of a char * address to a memory location was different from the 'anything else pointer' address to the same memory location. The machine was word-oriented, and byte-aligned addresses had extra bits set in the high end of the address. (This was in the days when the expansion of memory from 1 MiB to 2 MiB made a vast improvement because 750 KiB were used by the O/S, so we actually got about 5 times as much memory after as before for programs to use! Gigabytes and gibibytes were both fantasies, whether for disk or memory.)
Your code is:
int a = 0xff01;
char *s = &a;
char *t = (int *) &a;
Both the assignments have an int * on the RHS. The cast in the second line is superfluous; the type of &a is int * both before and after the cast. Assigning an int * to a char * is a warnable offense — hence the compiler warned. The types are different. Had you written char *t = (char *)&a;, then you would have gotten no warning from the compiler.
The printing code works because you take the char * values that were assigned to s and t and convert them back to the original int * before dereferencing them. This will usually work; the standard guarantees it for conversions to void * (instead of char *), but in practice it will normally work for anything * where anything is an object type, not a function type. (You are not guaranteed to be able to convert function pointers to data pointers and back again.)
The statement char *s = &a gives
warning: incompatible pointer conversion type.
In this case, the warning indicates a constraint violation: A compiler must complain and may refuse to compile. For initialization (btw, a declaration is not a statement), the same conversion rules as for assignment apply, and there is no implicit conversion from int * to char * (or the other way round). That is, an explicit cast is required:
char *s = (char *)&a;
Why do C compilers let me assign an integer variable to char *p? Why don’t they raise an error and restrict the programmer?
Well, you’ve got a warning. At the very least, a warning means you must understand why it is there before you ignore it. And as said above, in this case a compiler may refuse to compile.*)
We have int *ptr to point to integer variables, then why do they still allow programmer to make char *ptr point to integer variable?
Pointers to a character type are special, they are allowed to point to objects of every type. That you’re allowed to do so, doesn’t mean it’s a good idea, the cast is required to keep you from doing such a conversion accidently. For pointer-to-pointer conversions in general, see below.
int *ptr = (char *)&s;
Here, ptr is of type int *, and is initialized with a value of type char *. This is, again, a constraint violation.
printf("%x",*(char *)ptr); // prints the right output
If a conversion from a pointer to another is valid, the conversion back also is and always yields the original value.
If the pointer types are interchangeable with a typecast then what is the point to have two different pointers char *ptr, int *ptr?
Types exist to save you from errors. Casts exist to give you a way to tell the compiler that you know what you’re doing.
All pointers are of same size 4bytes(on 32bite machine). Then one could use void pointer.
That’s true for many architectures, but quite not for all the C standard addresses. Having only void pointers would be pretty useless, as you cannot really do anything with them: no arithmetic, no dereferencing.
Yes, I have read that back in old days char *ptrs played role of void pointers. Is this one good reason?
Perhaps a reason. (If a good one, is another question…)
When pointer-to-pointer conversions are allowed:
C11 (N1570) 6.3.2.3 p7
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned×) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
×) In general, the concept “correctly aligned” is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
Pointers to character types and pointers to void, as mentioned above, are always correctly aligned (and so are int8_t and uint8_t if they exist). There are platforms, on which a conversion from an arbitrary char pointer to an int pointer may violate alignment restrictions and cause a crash if executed.
If a converted pointer satisfies alignment requirements, this does not imply that it’s allowed to dereference that pointer, the only guarantee is that it’s allowed to convert it back to what it originally pointed to. For more information, look for strict-aliasing; in short, this means you’re not allowed to access an object with an expression of the wrong type, the only exception being the character types.
*) I don’t know the reasons in your particular case, but as an example, where it’s useful to give implementations such latitude in how to treat ill-formed programmes, see for example object-pointer-to-function-pointer conversions: They are a constraint violation (so they require a diagnostic message from the compiler) but are valid on POSIX systems (which guarantess well-defined semantics for such conversions). If the C standard required a conforming implementation to abort compilation, POSIX had to contradict ISO C (cf. POSIX dlsym for an example why these conversions can be useful), which it explicitly doesn’t intend to.
Pointers are not having any types, types described with pointer in program actually means that to which kind of data pointer is pointing. Pointers will be of same size.
When you write,
char *ptr;
it means that is is pointer to character type data and when dereferenced, it will fetch one bytes data from memory
Similarly,
double *ptr;
is pointer to double type data. So when you dereference, they will fetch 8 bytes starting from the location pointed by pointer.
Now remember that all the pointer are of 4 bytes on 32 bit machines irrespective of the type of data to which it is pointing. So if you store integer variable's address to a pointer which is pointing to character, it is completely legal and if you dereference it, it will get only one byte from memory. That is lowest byte of integer on little endian pc and highest byte of integer on big endian pc.
Now you are type casting your pointer to int type explicitly. So while dereferencing it will get while integer and print it. There is nothing wrong with this and this is how pointers work in c.
In your second example you are doing the same. Assigning address of character type variable to pointer which is pointing to integer. Again you are type casting pointer to character type so by dereference it will get only one byte from that location which is your character.
And frankly speaking, i dont know any practical usage of void pointer but as far as i know, void pointers are used when many type of data is to be dereferenced using a single pointer.
Consider that you want to store an integer variable's address to pointer. So you will declare pointer of integer type. Now later in program there is a need to store a double variable's address to pointer. So instead of declaring a new pointer you store its address in int type pointer then if you dereference using it, there will be a big problem and result in logical error which may get unnoticed by you if you have forgot to type cast it to double type. This is not case with void pointer. If you use void pointer, you have to compulsarily type cast it to particular type inorder to fetch data from memory. Otherwise compiler will show error. So in such cases using void pointer reminds you that you have to type cast it to proper type every time otherwise compiler will show you error. But no error will be shown in previous case

Resources