If this code is correct:
char v1[ ] = "AB";
char v2[ ] = {"AB"};
char v3[ ] = {'A', 'B'};
char v4[2] = "AB";
char v5[2] = {"AB"};
char v6[2] = {'A', 'B'};
char *str1 = "AB";
char *str2 = {"AB"};
Then why this other one is not?
char *str3 = {'A', 'B'};
To the best of my knowledge (please correct me if I'm wrong at any point) "AB" is a string literal and 'A' and 'B' are characters (integers,scalars). In char *str1 = "AB"; the string literal "AB" is defined and the char pointer is set to point to that string literal (to the first element). With char *str3 = {'A', 'B'}; two characters are defined and stored in subsequent memory positions, and the char pointer "should" be set to point to the first one. Why is that not correct?
In a similar way, a regular char array like v3[] or v6[2] can indeed be initialized with {'A', 'B'}. The two characters are defined, the array is set to point to them and thus, being "turned into" or treated like a string literal. Why a char pointer like char *str3 does not behave in the same way?
Just for the record, gcc compiler warnings I get are "initialization makes pointer from integer without a cast" when it gets to the 'A', and "excess elements in scalar initializer" when it gets to the 'B'.
Thanks in advance.
There is one thing you need to learn about constant string literals. Except when used to initialize an array (for example in the case of v1 in your example code) constant string literals are themselves arrays. For example if you use the literal "AB" it is stored somewhere by the compiler as an array of three characters: 'A', 'B' and the terminator '\0'.
When you initialize a pointer to point to a literal string, as in the case of str1 and str2, then you are making those pointers point to the first character in those arrays. You don't actually create an array named str1 (for example) you just make it point somewhere.
The definition
char *str1 = "AB";
is equivalent to
char *str1;
str1 = "AB";
Or rather
char unnamed_array_created_by_compiler[] = "AB";
char *str1 = unnamed_array_created_by_compiler;
There are also other problematic things with the definitions you show. First of all the arrays v3, v4, v5 and v6. You tell the compiler they will be arrays of two char elements. That means you can not use them as strings in C, since strings needs the special terminator character '\0'.
In fact if you check the sizes of v1 and v2 you will see that they are indeed three bytes large, once for each of the characters plus the terminator.
Another important thing you miss is that while constant string literals are arrays of char, you miss the constant part. String literals are really read-only, even if not stored as such. That's why you should never create a pointer to char (like str1 and str2) to point to them, you should create pointers to constant char. I.e.
const char *str1 = "AB";
(" ") is for string and (' ') is for character. for an string a memory has been allocated and for character not. pointers points to a memory and you must allocate an specified memory to it but for array of characters is not necessary.
Related
For example in this code:
char *ptr = "string";
Is there a null terminator in the stored in the ptr[6] address?
When I test this and print a string, it prints "string", and if I print the ptr[6] char I get ''. I wanted to test this further so I did some research and found someone saying that strlen will always crash if there is not a null terminator. I ran this in my code and it returned 6, so does this mean that assigning a string to a char pointer initializes with a null terminator or am I misunderstanding what's happening?
Yes. String literals used as pointers will always end in a NUL byte. String literals used as array initializers will too, unless you specify a length that's too small for it to (e.g., char arr[] = "string"; and char arr[7] = "string"; both will, but char arr[6] = "string"; won't).
I know similar questions, like this question, have been posted and answered here but those answers don't offer me the complete picture, hence I'm posting this as a new question. Hope that is ok.
See following snippets -
char s[9] = "foobar"; //ok
s[1] = 'z' //also ok
And
char s[9];
s = "foobar" //doesn't work. Why?
But see following cases -
char *s = "foobar"; //works
s[1] = 'z'; //doesn't work
char *s;
s = "foobar"; //unlike arrays, works here
It is a bit confusing. I mean I have vague understanding that we can't assign values to arrays. But we can modify it. In case of char *s, it seems we can assign values but can't modify it because it is written in read only memory. But still I can't get the full picture.
What exactly is happening at low level?
char s[9] = "foobar"; This is initialization. An array of characters of size 9 is declared and then its contents receives the string "foobar" with any remaining characters set to '\0'.
s = "foobar" is just invalid C syntax. You cannot assign a string to a char array. To make s have the value foobar. Use strcpy(s,"foobar");
char *s = "foobar"; is also initialization, however, this assigns the address of the constant string foobar to the pointer variable s. Note that I say "constant string". A string literal is on most platforms constant. A better way of making this clear is to write const char *s = "foobar";
And indeed, your next assignment s[1]= 'z'; will not work because s is constant.
You need to understand what the expressions are actually doing, then it might come clear to you.
char s[9] = "foobar"; -> Initialize the char array s by the string literal "foobar". Correct.
s[1] = 'z' -> Assign the character constant 'z' to the second elem. of char array s. Correct.
char s[9]; s = "foobar"; -> Declare the char array a, then attempt to assign the string literal "foobar" to the char array. Not permissible. You can´t actually assign arrays in C, you can only initialize an array of char with a string when defining the array itself. That´s the difference. If you want to copy a string into an array of char use strcpy(s, "foobar"); instead.
char *s = "foobar"; -> Define the pointer to char s and initialize it to point to the string literal "foobar". Correct.
s[1] = 'z'; -> Attempt to modify the string literal "foobar", to which is s pointing to. Not permissible. A string literal is stored in read-only memory.
char *s; s = "foobar"; -> Declare the pointer to char s. Then assign the pointer to point to the string literal "foobar". Correct.
This declares array s with an initializer:
char s[9] = "foobar"; //ok
But this is an invalid assignment expression with array s on the left:
s = "foobar"; //doesn't work. Why?
Assignment expressions and declarations with initializers are not the same thing syntactically, although they both use an = in their syntax.
The reason that the assignment to the array s doesn't work is that the array decays to a pointer to its first element in the expression, so the assignment is equivalent to:
&(s[0]) = "foobar";
The assignment expression requires an lvalue on the left hand side, but the result of the & address operator is not an lvalue. Although the array s itself is an lvalue, the expression converts it to something that isn't an lvalue. Therefore, an array cannot be used on the left hand side of an assignment expression.
For the following:
char *s = "foobar"; //works
The string literal "foobar" is stored as an anonymous array of char and as an initializer it decays to a pointer to its first element. So the above is equivalent to:
char *s = &(("foobar")[0]); //works
The initializer has the same type as s (char *) so it is fine.
For the subsequent assignment:
s[1] = 'z'; //doesn't work
It is syntactically correct, but it violates a constraint, resulting in undefined behavior. The constraint that is being violated is that the anonymous arrays created by string literals are not modifiable. Assignment to an element of such an array is a modification and not allowed.
The subsequent assignment:
s = "foobar"; //unlike arrays, works here
is equivalent to:
s = &(("foobar")[0]); //unlike arrays, works here
It is assigning a char * value to a variable of type char *, so it is fine.
Contrast the following use of the initializer "foobar":
char *s = "foobar"; //works
with its use in the earlier declaration:
char s[9] = "foobar"; //ok
There is a special initialization rule that allows an array of char to be initialized by a string literal optionally enclosed by braces. That initialization rule is being used to initialize char s[9].
The string literal used to initialize the array also creates an anonymous array of char (at least notionally) but there is no way to access that anonymous array of char, so it may get omitted from the output of the compiler. This is in contrast with the anonymous array of char created by the string literal used to initialize char *s which can be accessed via s.
It may help to think of C as not allowing you to do anything with arrays except for assisting in a few special cases. C originated when programming languages did little more than help you move individual bytes and “words” (2 or maybe 4 bytes) around and do simple arithmetic and operations with them. With that in mind, let’s look at your examples:
char s[9] = "foobar"; //ok
This is one of the special cases: When you define an array of characters, the compiler will help you initialize it. In a definition, you may provide a string literal, which represents an array of characters, and the compiler will initialize your array with the contents of the string literal.
s[1] = 'z' //also ok
Yes, this just moves the value of one character into one array element.
char s[9];
s = "foobar" //doesn't work. Why?
This does not work because there is no assistance here. s and "foobar" are both arrays, but C has no provision for handling an array as one whole object.
However, although C does not handle an array as a whole object, it does provide some assistance for working with arrays. Since the compiler would not work with whole arrays, programmers needed some other ways to work with arrays. So C was given a feature that, when you used an array in an expression, the compiler would automatically convert it to a pointer to the first element of the array, and that would help the programmer write code to work with elements of the array. We see that in your next example:
char *s = "foobar"; //works
char *s declares s to be a pointer to char. Next, the string literal "foobar" represents an array. Above, we saw that using a string literal to initialize an array was a special case. However, here the string literal is not used to initialize an array. It is used to initialize a pointer, so the special case rules do not apply. In this case, the array represented by the string literal is automatically converted to a pointer to its first element. So s is initialized to be a pointer to the first element of the array containing “f”, “o”, “o”, “b”, “a”, “r”, and a null character.
s[1] = 'z'; //doesn't work
The arrays defined by string literals are intended to be constants. They are “read-only” in the sense that the C standard does not define what happens when you try to modify them. In many C implementations, they are assigned to memory that is read-only because the operating system and the computer hardware do not allow writing to it by normal program means. So s[1] = 'z'; may get an exception (trap) or a warning or error message from the compiler. (Ideally, char *s = "foobar"; would be disallowed because "foobar", being a constant, would have type const char [7]. However, because const did not exist in early C, the types of string literals do not have const.)
char *s;
s = "foobar"; //unlike arrays, works here
Here s is a char *, and the string literal "foobar" is automatically converted to a pointer to its first element, and that pointer is a char *, so the assignment is fine.
Firstly, I included C++ as C++ is just a parent of C, so I'm guessing both answers apply here, although the language I'm asking about and focusing on in this question is C, and not C++.
So I began reading the C book 'Head First C' not so long ago. In the book (page 43/278) it will answer a question for you. Are there any differences between
literal strings and character arrays.
I was totally thrown by this as I didn't know what a literal string was. I understand a string is just a array of characters, but what makes a 'string' literal? And why is it mentioning string in C if C doesn't actually provide any class (like a modern language such as C# or Java would) for string.
Can anyone help clean up this confusion? I really struggle to understand what Microsoft had to say about this here and think I need a more simple explanation I can understand.
A string literal is an unnamed string constant in the source code. E.g. "abc" is a string literal.
If you do something like char str[] = "abc";, then you could say that str is initialized with a literal. str itself is not a literal, since it's not unnamed.
A string (or C-string, rather) is a contiguous sequence of bytes, terminated with a null byte.
A char array is not necessarily a C-string, since it might lack a terminating null byte.
What is a literal string & char array in C?
C has 2 kinds of literals: string literals and compound literals. Both are unnamed and both can have their address taken. string literals can have more than 1 null character in them.
In the C library, a string is characters up to and including the first null character. So a string always has one and only one null character, else it is not a string. A string may be char, signed char, unsigned char.
// v---v string literal 6 char long
char *s1 = "hello";
char *s2 = "hello\0world";
// ^----------^ string literal 12 char long
char **s3 = &"hello"; // valid
// v------------v compound literal
int *p1 = (int []){2, 4};
int **p2 = &(int []){2, 4}; // vlaid
C specifies the following as constants, not literals, like 123, 'x' and 456.7. These constants can not have their address taken.
int *p3 = &7; // not valid
C++ and C differ in many of these regards.
A chararray is an array of char. An array may consist of many null characters.
char a1[3]; // `a1` is a char array size 3
char a2[3] = "123"; // `a2` is a char array size 3 with 0 null characters
char a3[4] = "456"; // `a3` is a char array size 4
char a4[] = "789"; // `a4` is a char array size 4
char a5[4] = { 0 }; // `a5` is a char array size 4, all null characters
The following t* are not char arrays, but pointers to char.
char *t1;
char *t2 = "123";
int *t3 = (char){'x'};
Are C constant character strings always null terminated without exception?
For example, will the following C code always print "true":
const char* s = "abc";
if( *(s + 3) == 0 ){
printf( "true" );
} else {
printf( "false" );
}
A string is only a string if it contains a null character.
A string is a contiguous sequence of characters terminated by and including the first null character. C11 §7.1.1 1
"abc" is a string literal. It also always contains a null character. A string literal may contain more than 1 null character.
"def\0ghi" // 2 null characters.
In the following, though, x is not a string (it is an array of char without a null character). y and z are both arrays of char and both are strings.
char x[3] = "abc";
char y[4] = "abc";
char z[] = "abc";
With OP's code, s points to a string, the string literal "abc", *(s + 3) and s[3] have the value of 0. To attempt to modified s[3] is undefined behavior as 1) s is a const char * and 2) the data pointed to by s is a string literal. Attempting to modify a string literal is also undefined behavior.
const char* s = "abc";
Deeper: C does not define "constant character strings".
The language defines a string literal, like "abc" to be a character array of size 4 with the value of 'a', 'b', 'c', '\0'. Attempting to modify these is UB. How this is used depends on context.
The standard C library defines string.
With const char* s = "abc";, s is a pointer to data of type char. As a const some_type * pointer, using s to modify data is UB. s is initialized to point to the string literal "abc". s itself is not a string. The memory s initial points to is a string.
In short, yes. A string constant is of course a string and a string is by definition 0-terminated.
If you use a string constant as an array initializer like this:
char x[5] = "hello";
you won't have a 0 terminator in x simply because there's no room for it.
But with
char x[] = "hello";
it will be there and the size of x is 6.
The notion of a string is determinate as a sequence of characters terminated by zero character. It is not important whether the sequence is modifiable or not that is whether a corresponding declaration has the qualifier const or not.
For example string literals in C have types of non-constant character arrays. So you may write for example
char *s = "Hello world";
In this declaration the identifier s points to the first character of the string.
You can initialize a character array yourself by a string using a string literal. For example
char s[] = "Hello world";
This declaration is equivalent to
char s[] = { 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' };
However in C you may exclude the terminating zero from an initialization of a character array.
For example
char s[11] = "Hello world";
Though the string literal used as the initializer contains the terminating zero it is excluded from the initialization. As result the character array s does not contain a string.
In C, there isn't really a "string" datatype like in C++ and Java.
Important principle that every competent computer science degree program should mention: Information is symbols plus interpretation.
A "string" is defined conventionally as any sequence of characters ending in a null byte ('\0').
The "gotcha" that's being posted (character/byte arrays with the value 0 in the middle of them) is only a difference of interpretation. Treating a byte array as a string versus treating it as bytes (numbers in [0, 255]) has different applications. Obviously if you're printing to the terminal you might want to print characters until you reach a null byte. If you're saving a file or running an encryption algorithm on blocks of data you will need to support 0's in byte arrays.
It's also valid to take a "string" and optionally interpret as a byte array.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
strtok wont accept: char *str
When using the strtok function, using a char * instead of a char [] results in a segmentation fault.
This runs properly:
char string[] = "hello world";
char *result = strtok(string, " ");
This causes a segmentation fault:
char *string = "hello world";
char *result = strtok(string, " ");
Can anyone explain what causes this difference in behaviour?
char string[] = "hello world";
This line initializes string to be a big-enough array of characters (in this case char[12]). It copies those characters into your local array as though you had written out
char string[] = { 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' };
The other line:
char* string = "hello world";
does not initialize a local array, it just initializes a local pointer. The compiler is allowed to set it to a pointer to an array which you're not allowed to change, as though the code were
const char literal_string[] = "hello world";
char* string = (char*) literal_string;
The reason C allows this without a cast is mainly to let ancient code continue compiling. You should pretend that the type of a string literal in your source code is const char[], which can convert to const char*, but never convert it to a char*.
In the second example:
char *string = "hello world";
char *result = strtok(string, " ");
the pointer string is pointing to a string literal, which cannot be modified (as strtok() would like to do).
You could do something along the lines of:
char *string = strdup("hello world");
char *result = strtok(string, " ");
so that string is pointing to a modifiable copy of the literal.
strtok modifies the string you pass to it (or tries to anyway). In your first code, you're passing the address of an array that's been initialized to a particular value -- but since it's a normal array of char, modifying it is allowed.
In the second code, you're passing the address of a string literal. Attempting to modify a string literal gives undefined behavior.
In the second case (char *), the string is in read-only memory. The correct type of string constants is const char *, and if you used that type to declare the variable you would get warned by the compiler when you tried to modify it. For historical reasons, you're allowed to use string constants to initialize variables of type char * even though they can't be modified. (Some compilers let you turn this historic license off, e.g. with gcc's -Wwrite-strings.)
The first case creates a (non const) char array that is big enough to hold the string and initializes it with the contents of the string. The second case creates a char pointer and initializes it to point at the string literal, which is probably stored in read only memory.
Since strtok wants to modify the memory pointed at by the argument you pass it, the latter case causes undefined behavior (you're passing in a pointer that points at a (const) string literal), so its unsuprising that it crashes
Because the second one declares a pointer (that can change) to a constant string...
So depending on your compiler / platform / OS / memory map... the "hello world" string will be stored as a constant (in an embedded system, it may be stored in ROM) and trying to modify it will cause that error.