Firstly, I included C++ as C++ is just a parent of C, so I'm guessing both answers apply here, although the language I'm asking about and focusing on in this question is C, and not C++.
So I began reading the C book 'Head First C' not so long ago. In the book (page 43/278) it will answer a question for you. Are there any differences between
literal strings and character arrays.
I was totally thrown by this as I didn't know what a literal string was. I understand a string is just a array of characters, but what makes a 'string' literal? And why is it mentioning string in C if C doesn't actually provide any class (like a modern language such as C# or Java would) for string.
Can anyone help clean up this confusion? I really struggle to understand what Microsoft had to say about this here and think I need a more simple explanation I can understand.
A string literal is an unnamed string constant in the source code. E.g. "abc" is a string literal.
If you do something like char str[] = "abc";, then you could say that str is initialized with a literal. str itself is not a literal, since it's not unnamed.
A string (or C-string, rather) is a contiguous sequence of bytes, terminated with a null byte.
A char array is not necessarily a C-string, since it might lack a terminating null byte.
What is a literal string & char array in C?
C has 2 kinds of literals: string literals and compound literals. Both are unnamed and both can have their address taken. string literals can have more than 1 null character in them.
In the C library, a string is characters up to and including the first null character. So a string always has one and only one null character, else it is not a string. A string may be char, signed char, unsigned char.
// v---v string literal 6 char long
char *s1 = "hello";
char *s2 = "hello\0world";
// ^----------^ string literal 12 char long
char **s3 = &"hello"; // valid
// v------------v compound literal
int *p1 = (int []){2, 4};
int **p2 = &(int []){2, 4}; // vlaid
C specifies the following as constants, not literals, like 123, 'x' and 456.7. These constants can not have their address taken.
int *p3 = &7; // not valid
C++ and C differ in many of these regards.
A chararray is an array of char. An array may consist of many null characters.
char a1[3]; // `a1` is a char array size 3
char a2[3] = "123"; // `a2` is a char array size 3 with 0 null characters
char a3[4] = "456"; // `a3` is a char array size 4
char a4[] = "789"; // `a4` is a char array size 4
char a5[4] = { 0 }; // `a5` is a char array size 4, all null characters
The following t* are not char arrays, but pointers to char.
char *t1;
char *t2 = "123";
int *t3 = (char){'x'};
Related
We can assign a string in C as follows:
char *string;
string = "Hello";
printf("%s\n", string); // string
printf("%p\n", string); // memory-address
And a number can be done as follows:
int num = 4404;
int *nump = #
printf("%d\n", *nump);
printf("%p\n", nump);
My question then is, why can't we assign a pointer to a number in C just like we do with strings? For example, doing:
int *num;
num = 4404;
// and the rest...
What makes a string fundamentally different than other primitive types? I'm quite new to C so any explanation as to the difference between the two would be very helpful.
There is no such type as "string" in C. A string is not a primitive type. A string is just an array of characters, terminated by a NUL byte ('\0').
When you do this:
char *string;
string = "Hello";
What really happens is that the compiler is smart and creates a constant read only char array and then assigns it to your variable string. This can be done because in C the name of an array is the same as the pointer to its first element.
// This is placed in a different section:
const char hidden_arr[] = {'H', 'e', 'l', 'l', 'o', '\0'};
char *string;
string = hidden_arr;
// Same as:
string = &(hidden_arr[0]);
Here, hidden_arr and string are both char *, because as we just said the name of an array is equal to the pointer to its first element. Of course, all of this is done transparently, you will not actually see another variable named hidden_arr, that's just an example. In reality the string will be stored in some location in your executable without a name, and the address of that location will be copied to your string pointer.
When you try to do the same with an integer, it's wrong because int * and int are different types, and you cannot write this (well, you can, but it's meaningless and does not do what you expect it to):
int *ptr;
ptr = 123;
But, you can very well do it with an array of integers:
int arr[] = {1, 2, 3};
int *ptr;
ptr = arr;
// Same as:
ptr = &(arr[0]);
why can't we assign a pointer to a number in C just like we do with strings?
int *num;
num = 4404;
Code can do that if 4404 is a valid address for an int.
An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.
C11dr §6.3.2.3 5
If the address is not properly aligned --> undefined behavior (UB).
If the address is a trap --> undefined behavior (UB).
Attempting to de-reference the pointer is a problem unless it points to a valid int.
printf("%d\n", *num);
With below, "Hello" is a string literal. It exist someplace. The assignment take the address of the string literal and assigns that to string.
char *string;
string = "Hello";
The point is that that address assigned is known to be valid for a char *.
In the num = 4404; is not known to be valid (it likely is not).
What makes a string fundamentally different than other primitive types?
In C, a string is a C library specification, not a C language one. It is definition convenient to explaining various function therein.
A string is a contiguous sequence of characters terminated by and including the first null character §7.1.1 1
Primitive types are part of the C language.
The languages also has string literals like "Hello" in char *string; string = "Hello";. These have some similarity to strings, yet differ.
I recommend searching for "ISO/IEC9899:2017" to find a draft copy of the current C spec. It will answer many of your 10 question of the last week.
What makes a string fundamentally different than other primitive types?
A string seems like a primitive type in C because the compiler understands "foo" and generates a null-terminated character array: ['f', 'o', 'o', '\0']. But a C string is still just that: an array of characters.
My question then is, why can't we assign a pointer to a number in C just like we do with strings?
You certainly can assign a pointer to a number, it's just that a number isn't a pointer, whereas the value of an array is the address of the array. If you had an array of int, then that would work just like a string. Compare your code:
char *string;
string = "Hello";
printf("%s\n", string); // string
printf("%p\n", string); // memory-address
to the analogous code for an array of integers:
int numbers[] = {1, 2, 3, 4, 5, 0};
int *nump = numbers;
printf("%d\n", nump[0]); // string
printf("%p\n", nump); // memory-address
The only real difference is that the compiler has some extra syntax for arrays of characters because they're so common, and printf() similarly has a format specifier just for character arrays for the same reason.
The type pf a string literal (e.g. "hello world") is a char[]. Where assigning char *string = "Hello" means that string now points to the start of the array (e.g. the address of the first memory address in the array: &char[0]).
Whereas you can't assign an integer to a pointer because their types are different, one is a int the other is a pointer int *. You could cast it to the correct type:
int *num;
num = (int *) 4404;
But this would be considered quite dangerous (unless you really know what you are doing). I.e. do you know what is a memory adress 4404?
I've studied C programming at university for 4 months. My professor always said that strings don't really exist. Since I finished those 2 small courses, I really started programming (java). I can't remember WHY strings don't really exist. I wasn't concerned about this before, but I'm curious now. Why don't they exist? And do they exist in Java? I know it has to do something with that "under the hood strings are just characters", but does that mean that strings are all saved as multiple characters etc? And doesn't that take more memory?
a string type does not exist in C, but C strings do exist. They are defined as a null terminated character array. For example:
char buffer1[] = "this is a C string";//string literal
creates a C string that looks like this in memory:
|t|h|i|s| |i|s| |a| |C| |s|t\r|i|n|g|\0|?|?|?|
< string >
Note that this is not a string:
char *buffer2;
Until it contains a series of char terminated by a \0, it is just a pointer to char. (char *)
buffer2 = calloc(strlen(buffer1)+1, 1);
strcpy(buffer2, buffer1); //now buffer2 is pointing to a string
References:
Strings in C 1
Strings in C 2
Stirngs in C 3
and many more...
Edit:
(to address discussion in comments on strings:)
Based on the following definition: (From here)
Strings are actually one-dimensional array of characters terminated by
a null character '\0'.
First, since null termination is integral to a conversation about C strings, here are some clarifications:
The term NULL is a pointer, typically defined as (void*)0), or
just 0. It can be, and typically is used to initialize pointer
variables.
The term '\0' is a character. In C, it means exactly the same
thing as the integer constant 0. (same value 0, same type
int). It is used to initialize char arrays.
Things that are strings:
char string[] = {'\0'}; //zero length or _empty_ string with `sizeof` 1.
In memory:
|\0|
...
char string[10] = {'\0'} also zero length or _empty_ with `sizeof` 10.
In memory:
|\0|\0|\0|\0|\0|\0|\0|\0|\0|\0|
...
char string[] = {"string"}; string of length 6, and `sizeof` 7.
In memory:
|s|t|r|i|n|g|\0|
...
char [2][5] = {{0}}; 2 strings, each with zero length, and `sizeof` 5.
In memory:
|0|0|0|0|0|0|0|0|0|0| (note 0 equivalent to \0)
...
char *buf = {"string"};//string literal.
In memory:
|s|t|r|i|n|g|\0|
Things that are not strings:
char buf[6] = {"string"};//space for 6, but "string" requires 7 for null termination.
In Memory:
|s|t|r|i|n|g| //no null terminator
|end of space in memory.
...
char *buf = {0};//pointer to char (`char *`).
In memory:
|0| //null initiated pointer residing at address of `buf` (eg. 0x00123650)
Strings don't exist in C as a data type. There is int, char, byte, etc., but no "string".
This means you can declare a variable as an int, but not as a "string" because there is no data type named "string" .
The closest C has to a string is an array of chars, or a char * to a section of memory. The actual string is up to the programmer to define, as a sequence of chars terminated with a \0, or a number of chars with a known upper bound.
Are C constant character strings always null terminated without exception?
For example, will the following C code always print "true":
const char* s = "abc";
if( *(s + 3) == 0 ){
printf( "true" );
} else {
printf( "false" );
}
A string is only a string if it contains a null character.
A string is a contiguous sequence of characters terminated by and including the first null character. C11 §7.1.1 1
"abc" is a string literal. It also always contains a null character. A string literal may contain more than 1 null character.
"def\0ghi" // 2 null characters.
In the following, though, x is not a string (it is an array of char without a null character). y and z are both arrays of char and both are strings.
char x[3] = "abc";
char y[4] = "abc";
char z[] = "abc";
With OP's code, s points to a string, the string literal "abc", *(s + 3) and s[3] have the value of 0. To attempt to modified s[3] is undefined behavior as 1) s is a const char * and 2) the data pointed to by s is a string literal. Attempting to modify a string literal is also undefined behavior.
const char* s = "abc";
Deeper: C does not define "constant character strings".
The language defines a string literal, like "abc" to be a character array of size 4 with the value of 'a', 'b', 'c', '\0'. Attempting to modify these is UB. How this is used depends on context.
The standard C library defines string.
With const char* s = "abc";, s is a pointer to data of type char. As a const some_type * pointer, using s to modify data is UB. s is initialized to point to the string literal "abc". s itself is not a string. The memory s initial points to is a string.
In short, yes. A string constant is of course a string and a string is by definition 0-terminated.
If you use a string constant as an array initializer like this:
char x[5] = "hello";
you won't have a 0 terminator in x simply because there's no room for it.
But with
char x[] = "hello";
it will be there and the size of x is 6.
The notion of a string is determinate as a sequence of characters terminated by zero character. It is not important whether the sequence is modifiable or not that is whether a corresponding declaration has the qualifier const or not.
For example string literals in C have types of non-constant character arrays. So you may write for example
char *s = "Hello world";
In this declaration the identifier s points to the first character of the string.
You can initialize a character array yourself by a string using a string literal. For example
char s[] = "Hello world";
This declaration is equivalent to
char s[] = { 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' };
However in C you may exclude the terminating zero from an initialization of a character array.
For example
char s[11] = "Hello world";
Though the string literal used as the initializer contains the terminating zero it is excluded from the initialization. As result the character array s does not contain a string.
In C, there isn't really a "string" datatype like in C++ and Java.
Important principle that every competent computer science degree program should mention: Information is symbols plus interpretation.
A "string" is defined conventionally as any sequence of characters ending in a null byte ('\0').
The "gotcha" that's being posted (character/byte arrays with the value 0 in the middle of them) is only a difference of interpretation. Treating a byte array as a string versus treating it as bytes (numbers in [0, 255]) has different applications. Obviously if you're printing to the terminal you might want to print characters until you reach a null byte. If you're saving a file or running an encryption algorithm on blocks of data you will need to support 0's in byte arrays.
It's also valid to take a "string" and optionally interpret as a byte array.
If this code is correct:
char v1[ ] = "AB";
char v2[ ] = {"AB"};
char v3[ ] = {'A', 'B'};
char v4[2] = "AB";
char v5[2] = {"AB"};
char v6[2] = {'A', 'B'};
char *str1 = "AB";
char *str2 = {"AB"};
Then why this other one is not?
char *str3 = {'A', 'B'};
To the best of my knowledge (please correct me if I'm wrong at any point) "AB" is a string literal and 'A' and 'B' are characters (integers,scalars). In char *str1 = "AB"; the string literal "AB" is defined and the char pointer is set to point to that string literal (to the first element). With char *str3 = {'A', 'B'}; two characters are defined and stored in subsequent memory positions, and the char pointer "should" be set to point to the first one. Why is that not correct?
In a similar way, a regular char array like v3[] or v6[2] can indeed be initialized with {'A', 'B'}. The two characters are defined, the array is set to point to them and thus, being "turned into" or treated like a string literal. Why a char pointer like char *str3 does not behave in the same way?
Just for the record, gcc compiler warnings I get are "initialization makes pointer from integer without a cast" when it gets to the 'A', and "excess elements in scalar initializer" when it gets to the 'B'.
Thanks in advance.
There is one thing you need to learn about constant string literals. Except when used to initialize an array (for example in the case of v1 in your example code) constant string literals are themselves arrays. For example if you use the literal "AB" it is stored somewhere by the compiler as an array of three characters: 'A', 'B' and the terminator '\0'.
When you initialize a pointer to point to a literal string, as in the case of str1 and str2, then you are making those pointers point to the first character in those arrays. You don't actually create an array named str1 (for example) you just make it point somewhere.
The definition
char *str1 = "AB";
is equivalent to
char *str1;
str1 = "AB";
Or rather
char unnamed_array_created_by_compiler[] = "AB";
char *str1 = unnamed_array_created_by_compiler;
There are also other problematic things with the definitions you show. First of all the arrays v3, v4, v5 and v6. You tell the compiler they will be arrays of two char elements. That means you can not use them as strings in C, since strings needs the special terminator character '\0'.
In fact if you check the sizes of v1 and v2 you will see that they are indeed three bytes large, once for each of the characters plus the terminator.
Another important thing you miss is that while constant string literals are arrays of char, you miss the constant part. String literals are really read-only, even if not stored as such. That's why you should never create a pointer to char (like str1 and str2) to point to them, you should create pointers to constant char. I.e.
const char *str1 = "AB";
(" ") is for string and (' ') is for character. for an string a memory has been allocated and for character not. pointers points to a memory and you must allocate an specified memory to it but for array of characters is not necessary.
I'm having a problem casting, so is there a way to cast a type of:
char *result;
to a type of
char *argv[100];
?
If so, how would I do this or is there a safe way to do this?
char * result is a string
and
char * argv[100] is array of strings.
You cannot convert string into array of strings. However, you can create an array of strings where the first array value is result.
argv[0] = result;
char *result is a pointer to a char
char *argv[100] is an array of char *, so really it's a char ** (a pointer to pointers)
Keep this in mind:
int* arr[8]; // An array of int pointers.
int (*arr)[8]; // A pointer to an array of integers
This being the case, this is probably not what you want to be doing. I suppose the next question is: What were you trying to do? Or why?
What does result contain, and what do you expect argv to contain after the conversion?
For example, if result points to a list of strings separated by a delimiter, like "foo,bar,bletch,blurga,.., and you want each string to be a separated element in argv, like
argv[0] == "foo"
argv[1] == "bar"
argv[2] == "bletch"
argv[4] == "blurga"
then you could not accomplish this with a simple cast; you'd have to actually scan result and assign individual pointers to argv.
First, a short explanation:
char *result;
The variable result is a pointer and when set it will point to a single character. However, as you know, a single character can be the start of a string that ends with the null (\0) character.
In C, a good programmer can use the pointer result to index through a string.
However, the string's length is NOT known until the pointer reaches a null character.
It is possible to define a fixed length string of characters, in this case code:
char s[100];
Now, the fun begins. s per Kernighan and Ritchie (K&R) is a pointer to a string of characters terminated with a 0.
So, you can code:
s[0] = 'a';
*s = 'a';
s[1] = 'b';
*(s+1) = 'b';
These are equivalent statements.
As mentioned in other posts, let's add explicit parens to your argv statement:
char *(argv[100]);
Thus, this is an array of 100 pointers to characters (each of which might or might not be the start of a string of characters).