Strings in C: Why does this work? - c

I'm sorry for asking something that probably seems a little inane as it is apparently not broken but my (newbie) understanding of how C handles string literals tells me that this should not work...
char some_array_of_strings[3][200];
strcpy(some_array_of_strings[2], "Some garbage");
strcpy(some_array_of_strings[2], "Some other garbage");
I thought that C prevented the direct modification of string literals and that was why pointers were used when dealing with strings. The fact that this works tells me I am misunderstanding something.
Also, if this works, why does...
some_array_of_strings[1]="Some garbage"
some_array_of_strings[1]="Some garbage that causes a compiler error due to reassignment"
not work?

Be careful with the phrase "array of strings". A "string" is not a data type in C; it's a data layout. Specifically, a string is defined as
a contiguous sequence of characters terminated by and including the
first null character
An array of char may contain a string, and a char* pointer may point to (the first character of) a string. (The standard defines a pointer to the first character of a string as a pointer to a string.)
char some_array_of_strings[3][200];
This defines a 3-element array, where each of the elements is a 200-element array of char. (It's a 2-dimensional array, which in C is simply an array of arrays.)
strcpy(some_array_of_strings[2], "Some garbage");
The string literal "Some garbage" refers to an anonymous statically allocated array of char; it exists for the entire execution of your program, and you're not allowed to modify it. The strcpy() call, as the name implies, copies the contents of that array, up to and including the terminating '\0' null character, into some_array_of_strings[].
strcpy(some_array_of_strings[2], "Some other garbage");
Same thing: this copies the contents "Some other garbage" into some_array_of_strings[2], overwriting what you copied on the previous line. In both cases, there's more than enough room.
You're not modifying a string literal, you're modifying your own array by copying bytes from a string literal (more precisely, from that anonymous array I mentioned above).
some_array_of_strings[1]="Some garbage";
This doesn't just "not work", it's illegal. There is no assignment of arrays in C.
Let's take a simpler example:
char arr[10];
arr = "hello"; /* also illegal */
arr is an object of array type. In most contexts, an expression of array type is implicitly converted to a pointer to the array object's first element. That applies to both sides of the assignment: the object name arr and the string literal "hello".
But the pointer on the left side is just a pointer value. There is no pointer object. In technical terms, it's not an lvalue, so it can't appear on the left side of an assignment any more than you could write 42 = x;).
(If the array-to-pointer conversion didn't happen, it would still be illegal, because C doesn't permit array assignments.)
Some more detail on the issue of arrays on the left side of assignment:
The contexts where an array expression doesn't decay into a pointer are when the array expression is:
an operand of the unary sizeof operator;
an operand of unary & (address-of) operator; or
a string literal in an initializer used to initialize an array object.
The left side of an assignment isn't any of those contexts, so in:
char array[10];
array = "hello";
LHS is, in principle, converted to a pointer. But the resulting pointer expression is no longer an lvalue, which makes the assignment a constraint violation.
One way to look at it is that the expression array is converted to a pointer, which then makes the assignment illegal. Another is that since the assignment is illegal, the whole program is not valid C, so it has no defined behavior and it's meaningless to ask whether any conversion does or does not happen.
(I'm playing a little fast and loose with my use of the word "illegal", but this answer is long enough already so I won't get into it.)
Recommended reading: Section 6 of the comp.lang.c FAQ; it does an excellent job of explaining the often bewildering relationship between arrays and pointers in C.

you aren't modifying the string literal, you are using it as a source to copy it into your array of characters. Once the copy is finished, your string literal has nothing to do with the copy in your array. You are free to then manipulate the array.

From your definition, char some_array_of_strings[3][200]; indicates that some_array_of_strings is an array of 3 elements, each of which is itself an array of 200 characters or strings of length 200 characters.
strcpy(some_array_of_strings[2], "Some garbage");
strcpy(some_array_of_strings[2], "Some other garbage");
In these 2 statements, you are actually copying the content from one char pointer to another char pointer which is valid. some_array_of_strings[2] is actually similar to char[200] which is similar to char *.
some_array_of_strings[1]="Some garbage";
some_array_of_strings[1]="Some garbage that causes a compiler error due to reassignment";
Here, you are assigning a char * like "Some garbage" to a char[200] i.e. some_array_of_strings[1] which is not supported. The difference lies in assigning and copying the content.

some_array_of_strings[1]="Some garbage"
some_array_of_strings[1]="Some garbage that causes a compiler error due to reassignment"
In the first line you assign some_array_of_strings[1] to a string literal so the address of some_array_of_strings[1] or &some_array_of_strings[1] points to a string literal. So in the second line when you try to reassign some_array_of_strings[1] it gives you the error.
It is just as Keith and Fred have said, with strcpy you are only copying the characters of the string literal into you array.

some_array_of_strings[2] is an array of 200 chars.
When it's used in most expressions, it "decays" (fancy word for converts) into a pointer to the first element of the array.
strcpy(some_array_of_strings[2], "Some garbage"); then copies "Some garbage" character by character into that array of 200 chars, by making use of the pointer to the first element of the array and advancing it one by one.
In most expressions "Some garbage" is a pointer to an array of chars containing those respective characters plus a string termination character ('\0').
some_array_of_strings[1]="Some garbage" on the other hand attempts to assign a pointer (to the string) to the constant/non-modifiable pointer to the first element of 200 chars, which is also illegal (like doing 1=2;)

Related

if array type does not have = operator then I understand that but why my casting of pointer/array to pointer to array is working not as expected

why this code does not seem to work the way I expect
char *c="hello";
char *x=malloc(sizeof(char)*5+1);
memcpy(x,(char(*)[2])c,sizeof("hello"));
printf("%s\n",x);
On this question I got comment you cannot cast a pointer to an array. But you can cast it to a pointer to array. Try (char*[2])c so I am just casting to pointer to array of two char so it will get first two characters from c becuase this is what (char(*)[2])c suppose to do. If not then am I missing anything? and I thought since Iam copying it the at index after 1 and 2 I get junk because i did not call memset. why I am getting full hello write with memcpy even though I just casted it t0 (char(*)[2])
how to extract specific range of characters from string with casting to array type-- What it can't be done?
Converting a pointer does not change the memory the pointer points to. Converting the c to char [2] or char (*)[2] will not separate two characters from c.
c is char * that points to the first character of "hello".
(char (*)[2]) c says to take that address and convert it to the type “pointer to an array of 2 char”. The result points to the same address as before; it just has a different type. (There are some technical C semantic issues involved in type conversions and aliasing, but I will not discuss those in this answer.)
memcpy(x,(char(*)[2])c,sizeof("hello")); passes that address to memcpy. Due to the declaration of memcpy, that address is automatically converted to const void *. So the type is irrelevant (barring the technical issues mentioned above); whether you pass the original c or the converted (char (*)[2]) c, the result is a const void * to the same address.
sizeof "hello" is 6, because "hello" creates an array that contains six characters, including the terminating null character. So memcpy copies six bytes from "hello" into x.
Then x[5]='\0'; is redundant because the null character is already there.
To copy n characters from position p in a string, use memcpy(x, c + p, n);. In this case, you will need to manually append a null character if it is not included in the n characters. You may also need to guard against going beyond the end of the string pointed to by c.

Use of pointers to store character strings

I started learning pointers in C. I understood it fine untill I came across the topic "Using Pointers to store character arrays".
A sample program to highlight my doubt is as follows
#include <stdio.h>
main()
{
char *string;
string = "good";
printf ("%s", string);
}
This prints the character string, i.e, good.
Pointers are supposed to store memory addresses, or in other words, we assign the adress of a variable (using the address operator) to a pointer variable.
What I don't understand is how are we able to assign a character string directly to the pointer? That too without address operator?
Also, how are we able to print the string without the indirection operator (*) ?
A literal string like "good" is really stored as a (read-only) array of characters. Also, all strings in C must be terminated with a special "null" character '\0'.
When you do the assingment
string = "good";
what is really happening is that you make string point to the first character in that array.
Functions handling strings knows how to deal with pointers like that, and know how to loop over such arrays using the pointer to find all the characters in the string until it finds the terminator.
Looking at it a little differently, the compile creates its array
char internal_array[] = { 'g', 'o', 'o', 'd', '\0' };
then you make string point to the first element in the array
string = &internal_array[0];
Note that &internal_array[0] is actually equal to internal_array, since arrays naturally decays to pointers to their first element.
"cccccc" is a string literal which is actually the char array stored in the ReadOnly memory. You assign the pointer to the address of the first character of this literal.
if you want to copy string literal to the RAM you need to:
char string[] = "fgdfdfgdfgf";
Bare in mind that the array initialization (when you declare it) is the only place where you can use the = to copy the string literal to the char array (string).
In any other circumstances you need to use the appropriate library function for example.
strcpy(string, "asdf");
(the string has to have enough space to accommodate the new string)
What I don't understand is how are we able to assign a character string directly to the pointer? That too without address operator?
When an array is assigned to something, the array is converted to a pointer.
"good" is a string literal. It has a array 5 of char which includes a trailing null character. It exists in memory where write attempts should not be attempted. Attempting to write is undefined behavior (UB). It might "work", it might not. Code may die, etc.
char *string; declare string as pointer to char.
string = "good"; causes an assignment. The operation takes "good" and converts that array to the address and type (char*) of its first element 'g'. Then assigns that char * to string.
Also, how are we able to print the string without the indirection operator (*) ?
printf() expects a char * - which matches the type of string.
printf ("%s", string); passes string to printf() as a char * - no conversion is made. printf ("%s",... expects to see a "... the argument shall be a pointer to the initial element of an array of character type." then "Characters from the array are written up to (but not including) the terminating null character." C11 §7.21.6.1 8.
Your first question:
What I don't understand is how are we able to assign a character string directly to the pointer? That too without address operator?
A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, for e.g. "good".
From C Standard#6.4.5 [String literals]:
...The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.....
In C, an expression that has type array of type is converted to an expression with type pointer to type that points to the initial element of the array object [there are few exceptions]. Hence, the string literal which is an array decays into pointer which can be assigned to the type char *.
In the statement:
string = "good";
string will point to the initial character in the array where "good" is stored.
Your second question:
Also, how are we able to print the string without the indirection operator (*) ?
From printf():
s
writes a character string
The argument must be a pointer to the initial element of an array of characters...
So, format specifier %s expect pointer to initial element which is what the variable string is - a pointer to initial character of "good". Hence, you don't need indirection operator (*).

Pointers and Arrays in c, what is the difference? [duplicate]

This question already has answers here:
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 4 years ago.
I am learning about pointers. I don't understand what the difference between the char variables is.
#include <stdio.h>
int main() {
char *cards = "JQK";
char cards2[] = "JQK";
printf("%c\n", cards[2]);
printf("%c\n", cards2[2]);
}
I experimented with them in the printf() and they seem to be working the same way except that cards2[] can't be resigned while cards can. Why is this the case?
The difference is second is an array object initialized with the content of the string literal. First one is char* which contains the address of the string literal. String literals are array - this array is converted into pointer to first element and then that is assigned to char*.
The thing is, arrays are non modifiable lvalue - it can't appear in the left side of = assignment operator. A Pointer can (not marked const) can. And as the pointer is pointing to a string literal - you shouldn't try to modify it. Doing so will invoke undefined behavior.
Pointer and arrays are not the same thing - arrays decay into pointer in most cases. Here also that happened with those string literals when used in the right hand side of assignment in the pointer initialization case. Second one is different as it is explicitly mentioned that this will copy the content of the string literal to a the array declared - this is why this is modifiable unlike the previous case(Here cards2).
To clarify a bit - first let's know what is going on and what is the difference between array and pointer?
Here I have said that string literals are arrays. From §6.4.5¶6
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals.78) The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence. For UTF-8 string literals, the array elements have type char, and are initialized with the characters of the multibyte character sequence, as encoded in UTF-8.
This is what is written in c11 standard - it is posted to show you that string literals are indeed a character array of static storage duration. This is how it is. Now the question is when we write
char *cards = "JQK";
So now we have an array which is in the left hand side of a pointer declaration. What will happen?
Now comes the concept of array decaying. Most of the cases array is converted into pointers to the first element of it. This may seem strange at first but this is what happens. For example,
int a[]={1,2,3};
Now if you write a[i] this is equivalent to *(a+i) and a is the decayed pointer to the first element of the array. Now you are asking to go to the position a+i and give the value that is in that address, which is 3 for i=2. Same thing happened here.
The pointer to the first element of the literal array is assigned to cards. Now what happened next? cards points to the first element of the literal array which is J.
Story doesn't end here. Now standard has imposed a constraint over string literal. You are not supposed to change it - if you try to do that it will be undefined behavior - undefined behavior as the name implies is not defined by the standard. You should avoid it. So what does that mean? You shouldn't try to change the string literal or whatever the cards points to. Most implementation put string literals in read only section - so trying to write to it will be erroneous.
Being said that, what happened in the second case? Only thing is - this time we say that
char cards2[] = "JQK";
Now cards2 is an array object - to the right of the assignment operator there is string literal again. What will happen? From §6.7.9¶14
An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
The thing is now it means that - you can put the string literal to the right side of the assignment. The char array will be initialized. Here that is what is being done. And this is modifiable. So you can change it unlike the previous case. That is a key difference here.
Also if you are curious are there cases when we see an array as array and not as pointer - the whole rule is stated here.
From §6.3.2.1¶3
Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type 'array of type' is converted to an expression with type 'pointer to type' that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
This is all there is to this. This I hope will give you a clear idea than it gave before.

How can a char pointer contain a string in it?

If I define a char pointer in a C program and initialize it to "some String"
Can someone explain how a char pointer that's supposed to hold an address in it can hold a string in it?
Isn't it a contradiction to the definition of a pointer? What am I missing here?
For example :
char pointer=" how is it possible at all ? ";
printf("%s",pointer);
String literals, like "how is it possible at all ? " are really arrays of read-only characters stored somewhere by the compiler.
When you do
char *pointer=" how is it possible at all ? ";
you initialize pointer to point to the first element of that array.
This is very similar to
char string[] = " how is it possible at all ? ";
char *pointer = &string[0]; // Make pointer point to the first character in the array
How pointers themselves work depends on the compiler and the target architecture, but most of the time they are simple integers whose value is the address of the memory they point to. Then the compiler handles them specially and translates usage of the pointers into the correct machine-code instructions to access the memory a pointer is pointing to.
Because string literals are read only, that's the reason you should really use const char * when making pointers to them. C allows plain non-constant char *, but then the compiler might not be able to detect attempts to modify the read-only literal, which leads to undefined behavior.
In the statement:
char *pointer=" how is it possible at all ? ";
" how is it possible at all ? " is a string literal and a string literal is an array of characters.
From C Standards#6.3.2.1p3
Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ''array of type'' is converted to an expression with type ''pointer to type'' that points to the initial element of the array object and is not an lvalue.
So, the string literal which is an array decays into pointer of type char.
Can someone explain how a char pointer that's supposed to hold an address in it can hold a string in it?
Another way to look at it, by analogy: the same way a huge building can have a street address such as "123 Main Street".
The address is where it is - not how big it is.

c string basics, why unassigned?

I am trying to learn the basics, I would think that declaring a char[] and assigning a string to it would work.
thanks
int size = 100;
char str[size];
str = "\x80\xbb\x00\xcd";
gives error "incompatible types in assignment". what's wrong?
thanks
You can use a string literal to initialize an array of char, but you can't assign an array of char (any more than you can assign any other array). OTOH, you can assign a pointer, so the following would be allowed:
char *str;
str = "\x80\xbb\x00\xcd";
This is actually one of the most difficult parts of learning a programming language.... str is an array, that is, a part of memory (size times a char, so size chars) that has been reserved and labeled as str. str[0] is the first character, str[1] the second... str[size-1] is the last one. str itself, without specifiying any character, is a pointer to the memory zone that was created when you did
char str[size]
As Jerry so clearly said, in C you can not initialize arrays that way. You need to copy from one array to other, so you can do something like this
strncpy(str, "\x80\xbb\x00\xcd", size); /* Copy up to size characters */
str[size-1]='\0'; /* Make sure that the string is null terminated for small values of size */
Summarizing: It's very important to make a difference between pointers, memory areas and array.
Good luck - I am pretty sure that in less time than you imagine you will be mastering these concepts :)
A char-array can be implicitely cast to a char* when used as Rvalue, but not when used as Lvalue - that's why the assignment won't work.
You cannot assign array contents using the =operator. That's just a fact of the C language design. You can initialize an array in the declaration, such as
char str[size] = "\x80\xbb\x00\xcd";
but that's a different operation from an assignment. And note that in this case, and extra '\0' will be added to the end of the string.
The "incompatible types" warning comes from how array expressions are treated by the language. First of all, string literals are stored as arrays of char with static extent (meaning they exist over the lifetime of the program). So the type of the string literal "\x80\xbb\x00\xcd" is "4 5-element array of char". However, in most circumstances, an expression of array type will implicitly be converted ("decay") from type "N-element array of T" to "pointer to T", and the value of the expression will be the address of the first element in the array. So, when you wrote the statement
str = "\x80\xbb\x00\xcd";
the type of the literal was implicitly converted from "4 5-element array of char" to "pointer to char", but the target of the assignment is type "100-element array of char", and the types are not compatible (above and beyond the fact that an array expression cannot be the target of the = operator).
To copy the contents of one array to another you would have to use a library function like memcpy, memmove, strcpy, etc. Also, for strcpy to function properly, the source string must be 0-terminated.
Edit per R's comment below, I've struck out the more dumbass sections of my answer.
To assign a String Literal to the str Array you can use a the String copy function strcpy.
char a[100] = "\x80\xbb\x00\xcd"; OR char a[] = "\x80\xbb\x00\xcd";
str is the name of an array. The name of an array is the address of the 0th element. Therefore, str is a pointer constant. You cannot change the value of a pointer constant, just like you cannot change a constant (you can't do 6 = 5, for example).

Resources