I'm confused about the way C handles strings and char * vs char[].
char name[10] = "asd";
printf("%p\n%p", &name, &name[0]); //0x7ffed617acd
//0x7ffed617acd
If this code gives the same addresses for both arguments, does it mean that the C compiler takes char arrays (strings) as a pointer to the first char in the array and moves in the memory till it gets the null terminator? Why wouldn't the same happen if we changed the char name[] to char *name? (I know they differ but what makes C take both in a different way?)
I know that arrays can't be assigned after declaration (unless you used something like strcpy, strcat) which is also confusing. Why wouldn't C take them as any other data type? (Something tells me the compiler has a specific addr for it while you can assign char* to whatever location in the mem since its a pointer).
I know that char * have fixed size unlike char[] which makes char * not usable for first argument of strcat.
in C a "string" is an array of type "char" (terminated with \0).
When you are referring to an array in C, you are using a pointer to the first element. In this case (char *).
According to the ANSI-C standard the name of an array is a pointer to the first element.
Being able to write name instead of &name[0] is syntactical sugar.
In the same way accessing an array element writing name[i] is analogue to writing *(name+i).
does it mean that the c compiler takes char arrays (strings) as a pointer to the first char in the array
An array is not a pointer. But an array will implicitly convert to a pointer to first element. Such conversion is called "decaying".
... and moves in the memory till it gets the null terminator???
You can write such loop if you know the pointer is to an element of null terminated string. If you write that loop, then the compiler will produce a program that does such thing.
Why wouldn't the same happen if we changed the char name[] to char *name?
Your premise is faulty. You can iterate an array directly, as well as using a pointer.
If this code gives the same addresses for both arguments, does it mean
The address of an object is the first byte of the object. What this "same address" means is that the first byte of the first element of the array is in the same address as the first byte of the array as a whole.
I know that arrays can't be assigned after declaration (unless you used something like strcpy, strcat) which is also confusing.
Neither strcpy nor strcat assign an array. They assign elements of the array which you can also do without calling those functions.
Why wouldn't C take them as any other data type?
This question is unclear. What do you mean by "C taking them"? Why do you think C should take another data type? Which data type do you think it should take?
char name[10] = "asd";
printf("%p\n%p", &name, &name[0]);
The arguments are of type char(*)[10] and char* respectively. The %p format specifier requires that the argument is of type similar to void* which isn't similar to those arguments. Passing an argument of a type other than required by the format specifier results in undefined behaviour. You should cast other pointer types to void* when using %p.
Related
why this code does not seem to work the way I expect
char *c="hello";
char *x=malloc(sizeof(char)*5+1);
memcpy(x,(char(*)[2])c,sizeof("hello"));
printf("%s\n",x);
On this question I got comment you cannot cast a pointer to an array. But you can cast it to a pointer to array. Try (char*[2])c so I am just casting to pointer to array of two char so it will get first two characters from c becuase this is what (char(*)[2])c suppose to do. If not then am I missing anything? and I thought since Iam copying it the at index after 1 and 2 I get junk because i did not call memset. why I am getting full hello write with memcpy even though I just casted it t0 (char(*)[2])
how to extract specific range of characters from string with casting to array type-- What it can't be done?
Converting a pointer does not change the memory the pointer points to. Converting the c to char [2] or char (*)[2] will not separate two characters from c.
c is char * that points to the first character of "hello".
(char (*)[2]) c says to take that address and convert it to the type “pointer to an array of 2 char”. The result points to the same address as before; it just has a different type. (There are some technical C semantic issues involved in type conversions and aliasing, but I will not discuss those in this answer.)
memcpy(x,(char(*)[2])c,sizeof("hello")); passes that address to memcpy. Due to the declaration of memcpy, that address is automatically converted to const void *. So the type is irrelevant (barring the technical issues mentioned above); whether you pass the original c or the converted (char (*)[2]) c, the result is a const void * to the same address.
sizeof "hello" is 6, because "hello" creates an array that contains six characters, including the terminating null character. So memcpy copies six bytes from "hello" into x.
Then x[5]='\0'; is redundant because the null character is already there.
To copy n characters from position p in a string, use memcpy(x, c + p, n);. In this case, you will need to manually append a null character if it is not included in the n characters. You may also need to guard against going beyond the end of the string pointed to by c.
In C99 a string is typically initialized by using the char* data type since there is no primitive "string" data type. This effectively creates an array of chars by storing the address of the first char in the variable:
FILE* out = fopen("out.txt", "w");
char* s = argv[1];
fwrite(s, 12, 1, out);
fclose(out);
//successfully prints out 12 characters from argv[1] as a consecutive string.
How does the compiler know that char* s is a string and not just the address of a singular char? If I use int* it will only allow one int, not an array of them. Why the difference?
My main focus is understanding how pointers, referencing and de-referencing work, but the whole char* keeps messing with my head.
How does the compiler know that char* s is a string and not just the address of a singular char?
It doesn't. As far as "the compiler" is concerned, char* s is a pointer to char.
On the other hand, there are many library functions that assume that a char* points to an element of a null-terminated sequence of char (see for example strlen, strcmp etc.).
Note that fwrite does not make this assumption. It requires that you tell it how many bytes you want to write (and that this number doesn't take you beyond the bounds of the buffer pointed at by the first argument.)
If I use int* it will only allow one int, not an array of them. Why the difference?
That is incorrect. The C language does not have a special case for char*. An int* can also point to an element of an array of int. In fact, you could write a library that uses 0 or another sentinel value to indicate the end of a sequence of int, and use it much in the same was as char* are used by convention.
In your code
fwrite(s, 12, 1, out);
is equivalent to writing
write 12 elements of size 1 byte, location starting from address s to the file pointed by out.
Here. a char is of one byte exactly, so you get the desired output.
How does the compiler know that char* s is a string and not just the address of a singular char?
Well, it does not (and does not need to). You asked to (read from s and) write 12 bytes, so it will do that. If the memory is inaccessible, that's a programming mistake. fwrite() itself won't handle that.
Beware:
If s is not allocated memory to be accessed upto s[11] (technically), it will be undefined behaviour. It's upto the programmer to pass the valid values as argument.
In case of int, the size is 4 bytes (usually, on 32 bit system) and printing byte-by-byte won't give you the desired result.
In that case, you need to make use of fprintf() to print formatted output.
Compiler won't have any idea other than char* is an address of a character. We can make it read the characters following by incrementing the address of the first character. The case is similar to any pointer int*,long* etc., compiler just treat a pointer as something points to an address of its type.
This is a general question about C.(I dont have a lot of experience coding in C)
So, if I have a function that takes a char* as an argument. How to know whether its a pointer to a single char or a char array, because if it's a char array I can expect a \0 but if it's not a char array then I wouldn't want to search for \0.
Is char* in argument a pointer to a single char or a char array?
Yes.
A parameter of type char* is always a pointer to a char object (or a null pointer, not pointing to anything, if that's what the caller passes as the corresponding argument).
It's not a pointer to an array (that would be, for example, a pointer of type char(*)[42]), but the usual way to access the elements of an array is via a pointer to the element type, not to the whole array. Why? Because an actual pointer-to-array must always specify the length of the array (42 in my example), which is inflexible and doesn't let the same function deal with arrays of different lengths.
A char* parameter can be treated just as a pointer to a single char object. For example, a function that gets a character of input might be declared like this:
bool get_next_char(char *c);
The idea here is that the function's result tells you whether it was successful; the actual input character is "returned" via the pointer. (This is a contrived example; <stdio.h> already has several functions that read characters from input, and they don't use this mechanism.)
Compare the strlen function, which computes the length of a string:
size_t strlen(const char *s);
s points to the first element of an array of char; internally, strlen uses that pointer to traverse the array, looking for the terminating '\0' character.
Ignoring the const, there's no real difference between the char* parameters for these two functions. In fact, C has no good way to distinguish between these cases: a pointer that simply points to a single object vs. a pointer that points to the first element of an array.
It does have a bad way to make that distinction. For example, strlen could be declared as:
size_t strlen(const char s[]);
But C doesn't really have parameters of array type at all. The parameter declaration const char s[] is "adjusted" to const char *s; it means exactly the same thing. You can even declare a length for something that looks like an array parameter:
void foo(char s[42]);
and it will be quietly ignored; the above really means exactly the same thing as:
void foo(char *s);
The [42] may have some documentation value, but a comment has the same value -- and the same significance as far as the compiler is concerned.
Any distinction between a pointer to a single object and a pointer to the first element of an array has to be made by the programmer, preferably in the documentation for the function.
Furthermore, this mechanism doesn't let the function know how long the array is. For char* pointers in particular, it's common to use the null character '\0' as a marker for the end of a string -- which means it's the callers responsibility to ensure that that marker is actually there. Otherwise, you can pass the length as a separate argument, probably of type size_t. Or you can use any other mechanism you like, as long as everything is done consistently.
... because if it's a char array I can expect a \0 ...
No, you can't, at least not necessarily. A char* could easily point to the first element of a char array that's not terminated by a '\0' character (i.e., that doesn't contain a string). You can impose such a requirement if you like. The standard library functions that operate on strings impose that requirement -- but they don't enforce it. For example, if you pass a pointer to an unterminated array to strlen, the behavior is undefined.
Recommended reading: Section 6 of the comp.lang.c FAQ.
You cannot determine how many bytes are referenced by a pointer. You need to keep track of this yourself.
It is possible that a char array is NOT terminated with a \0 in which case you need to know the length of the array. Also, it is possible for an array to have a length of 1, in which case you have one character with no terminating \0.
The nice thing about C is that you get to define the details about data structures, thus you are NOT limited to a char array always ending with \0.
Some of the terms used to describe C data structures are synonymous. For example, an array is sequential series of data elements, an array of characters is a string, and a string can be terminated with a null char (\0).
I have seen in several pieces of code a string declared as char*. How does this work, surely it is a pointer to a single char, not an array of chars which makes up a string. If I wished to take string input to a method that would be called like this:
theMethod("This is a string literal");
What datatype should the parameter be?
surely it is a pointer to a single char, not an array of chars
It's a pointer to the first character of an array of char. One can access each element of the array using a pointer to its first element by performing pointer arithmetic and "array" indexing.
What datatype should the parameter be?
const char *, if you don't wish to modify the characters from within the function (this is the general case), and char * if you do.
This is a common beginner-C confusion. A pointer to any type, T *, is ambiguously either a pointer to a single object of type T, or a pointer to an element within a linear array of objects of type T, size unspecified. You, the programmer, are responsible for knowing which is which, and passing around length information as necessary. If you get it wrong, the compiler stands by and watches as your program drives off the undefined-behavior cliff.
To the extent C has strings (there is a strong case to be made that it doesn't really) they take shameless advantage of this ambiguity, such that when you see char * or const char * in a C program, it almost always will be a pointer to a string, not a single char. The same is not true of pointers to any other type.
per definition is "string" of type char * (or unsigned char * or const char *) however, it is a pointer to the first character of that character chain (i dont want to use the words array or vector). The Difference is to see in char: 'x' (single quote)
this is good old c programming (sometimes i could cry for loosing it)
char *p = "i am here";
for (q=p; ++q; *q) { // so lets start with p walk through and end wit the /0 after the last e
if (*q=='h') { // lets find the first 'h' and cut the string there
*(q-1)=0;
break;
}
}
i used no const and other probs here, i just try to clearify
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
What is the difference between char s[] and char *s in C?
Do these statements about pointers have the same effect?
All this time I thought that whenever I need to copy a string(either literal or in a variable) I need to use strcpy(). However I recently found out this:
char a[]="test";
and this
char *a="test";
From what I understand the second type is unsafe and will print garbage in some cases. Is that correct? What made me even more curious is why the following doesn't work:
char a[5];
a="test";
or this
char a[];
a="test";
but this works however
char *a;
a="test";
I would be greatful if someone could clear up things a bit.
char a[]="test";
This declares and initializes an array of size 5 with the contents of "test".
char *a="test";
This declares and initializes a pointer to the literal "test". Attempting to modify a literal through a is undefined behavior (and probably results in the garbage you are seeing). It's not unsafe, it just can't be modified since literals are immutable.
char a[5];
a="test";
This fails even when both a and "test" have the exact same type, just as any other attempt to copy arrays thorugh assignment.
char a[];
a="test";
This declares an array of unknown size. The declaration should be completed before being used.
char *a;
a="test";
This works just fine since "test" decays to a pointer to the literal's first element. Attempting to modify its contents is still undefined behavior.
Let's examine case by case:
char a[]="test";
This tells the compiler to allocate 5 bytes on the stack, put 't' 'e' 's' 't' and '\0' on it. Then the variable a points to where 't' was written and you have a pointer pointing to a valid location with 5 available spaces. (That is if you view a as a pointer. In truth, the compiler still treats a as a single custom type that consists of 5 chars. In an extreme case, you can imagine it something like struct { char a, b, c, d, e; } a;)
char *a="test";
"test" (which like I said is basically 't' 'e' 's' 't' and '\0') is stored somewhere in your program, say a "literal's area", and a is pointing to it. That area is not yours to modify but only to read. a by itself doesn't have any specific memory (I am not talking about the 4/8 bytes of pointer value).
char a[5];
a = "test";
You are telling the compiler to copy the contents of one string over to another one. This is not a simple operation. In the case of char a[] = "test"; it was rather simple because it was just 5 pushes on the stack. In this case however it is a loop that needs to copy 1 by 1.
Defining char a[];, well I don't think that's even possible, is it? You are asking for a to be an array of a size that would be determined when initialized. When there is no initialization, it's just doesn't make sense.
char *a;
a = "test";
You are defining a as a pointer to arrays of char. When you assign it to "test", a just points to it, it doesn't have any specific memory for it though, exactly like the case of char *a = "test";
Like I said, assigning arrays (whether null-terminated arrays of char (string) or any other array) is a non-trivial task that the compiler doesn't do for you, that is why you have functions for it.
Do not confuse assignment and initialisation in C, they are different.
In C a string is not a data type, it is a convention, utilising an array and a nul terminator. Like any array, when you assign it, it's name resolves as a mere pointer. You can assign a pointer, but that is not the same as assigning a string.