I am a noob to C programming (I come from the lands of JS and PHP), and as a learning exercise I attempted to write a program that asks for the user's name, and then prints it back out with the small exception of changing the first letter to a z. However, when I went to compile the code it returned the following error message in reference to the line name[0] = "Z";
warning: assignment makes integer from pointer without a cast
Is there a reason I can't assign a value to a specific index in a char array?
(Note: I have tried typecasting "Z" to a char but it just threw the error
warning: cast from pointer to integer of different size`)
Unlike some languages that do not distinguish between strings and characters, C requires a different syntax for characters (vs. a single-character string).
You need to use single quotes:
name[0] = 'Z';
The error is quite cryptic, though. It is trying to say that "Z", a single-character C string, gets assigned to name[0], an integral type of char. C strings are arrays; arrays are convertible to pointers. Hence, C treats this as a pointer-to-int assignment without a cast.
replace name[0] = "Z"; with name[0] = 'Z';.
'single-quatation' is for an character element and "double-quatation" is for a string assignment.
In C, single quotes and double quotes carry different meanings. In fact, there is no concept of "Strings" in C. You have the basic char data type, where a char is represented by single quotes. To represent strings, you store them as an array of chars. For example,
char text[] = {'h', 'e', 'l', 'l', 'o'};
This is just a more tedious way of writing
char text[] = "hello";
This is exactly the same as the first example, with the exception that there is a null character \0 at the end (this is how C detects the end of "strings"). It's the same as saying char text[] = {'h', 'e', 'l', 'l', 'o', '\0'}; except now you can work with your array more easily, if you want to do string based processing on it.
Coming to your question, if you want to index a certain character in a "string", you'd need to access it by it's index in the array.
So, text[0] returns the character h which is of type char. To assign a different value, you must assign a single quoted char as so:
text[0] = 'Z';
Related
In the code char * str = "hello";, I understand that code "hello" is to allocate the word hello to any other memory and then put the first value of that allocated memory into the variable str.
But when I use the code char str[10] = "hello";, I understood that the word hello is included in each element of the array.
If then, on the top, the code "hello" returns the address of the memory
and on the bottom, the code "hello" returns the word h e l l o \n.
I want to know why they are different and if I'm wrong, I want to know what double quotes return.
C is a bit quirky. You have two distinct use cases here. But let's first start with what "hello" is.
Your "hello" in the program source code is a character string literal. That is a character sequence enclosed in double quotes. When the compiler is compiling this source code, it appends a zero byte to the sequence, so that standard library functions like strlen() can work on it. The resulting zero-terminated sequence is then used by the compiler to "initialize an array of static storage duration and length just sufficient to contain the sequence array of constant characters" (n1570 ISO C draft, 6.4.5/6). That length is 6: The 5 characters h, e, l, l and o as well as the appended zero byte.
"Static storage duration" means that the array exists the entire time the program is running (as opposed to objects with automatic local storage duration, e.g. local variables, and those with dynamic storage duration, which are created via malloc() or calloc()).
You can memorize the address of that array, as in char *str = "hello";. This address will point to valid memory during the lifetime of the program.
The second use case is a special syntax for initializing character arrays. It is just syntactic sugar for this common use case, and a deviation from the fact that you cannot normally initialize arrays with arrays.1
This time you don't define a pointer, you define a proper array of 10 chars. You then use the string literal to initialize it. You always can use the generic method to initialize a character array by listing the individual array elements, separated by commas, in curly braces (by the way, this generic method works also for the other kind of compound types, namely structs):
char str[10] = { 'h', 'e', 'l', 'l', 'o', '\0' };
This is entirely equivalent to
char str[10] = "hello";
Now your array has more elements (10) than the number of characters in the initializing array produced from the string literal (6); the standard stipulates that "subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration". Those global and static variables are initialized with zero, which means that the character array str ends with 4 zero characters.
It is immediately obvious why Dennis Ritchie added the somewhat anti-paradigmatic initialization of character arrays via a string literal, probably after the second time he had to do it with the generic array initialization syntax. Designing your own language has its benefits.
1 For example, static char src[] = "123"; char dest[] = src; doesn't work. You have to use strcpy().
The initialization:
char * str = "hello";
in most C implementations makes sure that the string hello is placed in a constant data section of the executable memory. Exactly six bytes are written, the last one being the string terminator '\0'.
str char pointer contains the address of the first character 'h', so that anyone accessing the string knows that the following bytes have to be read until the terminator character is found.
The other initialization
char str[10] = "hello"; // <-- string must be enclosed in double quotes
is very similar, as str points to the first character of the string and that the following characters are written in the following memory locations (included the string terminator).
But:
Even if only six bytes are explicitly initialized, ten bytes are allocated because that's the size of the array. In this case, the four trailing bytes will contain zeroes
Data is not constant and can be changed, while in the previous example it wasn't possible because such initialization, in most C implementations, instructs the compiler to use a constant data section
You seem to be mixing up some things:
char str[10] = "hello';
This does not even compile: when you start with a double-quote, you should end with one:
char str[10] = "hello";
In memory, this has following effect:
str[0] : h
str[1] : e
str[2] : l
str[3] : l
str[4] : o
str[5] : 0 (the zero character constant)
str[6] : xxx
str[7] : xxx
str[8] : xxx
str[9] : xxx
(By xxx, I mean that this can be anything)
As a result, the code will not return hello\n (with an end-of-line character), just hello\0 (the zero character).
The double quotes just mention the beginning and the ending of a string constant and return nothing.
I am learning C and I came across the pointers.
Even though I learned more with this tutorial than from the textbook I still wonder about the char pointers.
If I program this
#include <stdio.h>
int main()
{
char *ptr_str;
ptr_str = "Hello World";
printf(ptr_str);
return 0;
}
The result is
Hello World
I don't understand how there isn't an error while compiling since the pointer ptr_str is pointing directly to the text and not to the first character of the text. I thought that only this would work
#include <stdio.h>
int main()
{
char *ptr_str;
char var_str[] = "Hello World";
ptr_str = var_str;
printf(ptr_str);
return 0;
}
So in the first example how was I pointing directly to the text?
Your code works because string literals are essentially static arrays.
ptr_str = "Hello World";
is treated by the compiler as if it were
static char __tmp_0[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0' };
ptr_str = __tmp_0;
(except trying to modify the contents of a string literal has undefined behavior).
You can even apply sizeof to a string literal and you'll get the size of the array: sizeof "Hello" is 6, for example.
In the context of assignment to a char pointer the 'value' of a string literal is the address of its first character.
so
ptr_str = "Hello World";
sets ptr_str to the address of the 'H'
Why won't the first one work? It will work as you have seen.
String literals are arrays. From §6.4.5p6 C11 Standard N1570
The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.
Now in the first case literal array decayed into pointer to first element - so decayed pointer will basically be pointing to 'H'. You assigned that pointer to ptr_str. Now printf will expect a format specifier and the corresponding argument. Here it will be %s and corresponding argument would be char*. And printf will print every character until it reached the \0. That's all it happened. This is how you ended up pointing directly to the text.
Note that second case is quite different from first case in that - second case a copy is being made which can be modified (Trying to modify the first one would be undefined behavior). We are basically initializing a char array with the content of the string literal.
Are C constant character strings always null terminated without exception?
For example, will the following C code always print "true":
const char* s = "abc";
if( *(s + 3) == 0 ){
printf( "true" );
} else {
printf( "false" );
}
A string is only a string if it contains a null character.
A string is a contiguous sequence of characters terminated by and including the first null character. C11 §7.1.1 1
"abc" is a string literal. It also always contains a null character. A string literal may contain more than 1 null character.
"def\0ghi" // 2 null characters.
In the following, though, x is not a string (it is an array of char without a null character). y and z are both arrays of char and both are strings.
char x[3] = "abc";
char y[4] = "abc";
char z[] = "abc";
With OP's code, s points to a string, the string literal "abc", *(s + 3) and s[3] have the value of 0. To attempt to modified s[3] is undefined behavior as 1) s is a const char * and 2) the data pointed to by s is a string literal. Attempting to modify a string literal is also undefined behavior.
const char* s = "abc";
Deeper: C does not define "constant character strings".
The language defines a string literal, like "abc" to be a character array of size 4 with the value of 'a', 'b', 'c', '\0'. Attempting to modify these is UB. How this is used depends on context.
The standard C library defines string.
With const char* s = "abc";, s is a pointer to data of type char. As a const some_type * pointer, using s to modify data is UB. s is initialized to point to the string literal "abc". s itself is not a string. The memory s initial points to is a string.
In short, yes. A string constant is of course a string and a string is by definition 0-terminated.
If you use a string constant as an array initializer like this:
char x[5] = "hello";
you won't have a 0 terminator in x simply because there's no room for it.
But with
char x[] = "hello";
it will be there and the size of x is 6.
The notion of a string is determinate as a sequence of characters terminated by zero character. It is not important whether the sequence is modifiable or not that is whether a corresponding declaration has the qualifier const or not.
For example string literals in C have types of non-constant character arrays. So you may write for example
char *s = "Hello world";
In this declaration the identifier s points to the first character of the string.
You can initialize a character array yourself by a string using a string literal. For example
char s[] = "Hello world";
This declaration is equivalent to
char s[] = { 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' };
However in C you may exclude the terminating zero from an initialization of a character array.
For example
char s[11] = "Hello world";
Though the string literal used as the initializer contains the terminating zero it is excluded from the initialization. As result the character array s does not contain a string.
In C, there isn't really a "string" datatype like in C++ and Java.
Important principle that every competent computer science degree program should mention: Information is symbols plus interpretation.
A "string" is defined conventionally as any sequence of characters ending in a null byte ('\0').
The "gotcha" that's being posted (character/byte arrays with the value 0 in the middle of them) is only a difference of interpretation. Treating a byte array as a string versus treating it as bytes (numbers in [0, 255]) has different applications. Obviously if you're printing to the terminal you might want to print characters until you reach a null byte. If you're saving a file or running an encryption algorithm on blocks of data you will need to support 0's in byte arrays.
It's also valid to take a "string" and optionally interpret as a byte array.
I'm using a template from my teacher and at the beginning of the code is says:
#include "lab8.h"
void main(void)
{
int response;
int count;
string words[MAX_COUNT];
Later on in the function, a whole lot of words get put inside the words string. So I was like looking at that last line and got confused. I thought char declared strings? What does that last line even do? I also noticed in a couple of function parameter lists later on, there was entered "string words" instead of what I expected that mention char or something.
EDIT:
typedef char string[MAX_LENGTH];
had been written in the .h file didn't see it.
C does not have a basic data type called string.
Check the lab8.h file carefully. Usually, string should be a typedef of unsigned char.
Essentially, string words[MAX_COUNT]; defines an array of variable type string containing MAX_COUNT number of variables.
C does not have a dedicated string data type. In C, a string is a sequence of character values followed by a zero-valued byte. Strings are stored as arrays of char, but not all arrays of char contain strings.
For example,
char word[] = { 'h', 'e', 'l', 'l', 'o', 0 };
stores the string "hello" in the array variable word. The array size is taken from the size of the initializer, which is 6 (5 characters plus the 0 terminator). The zero-valued byte serves as a sentinel value for string handling functions like strlen, strcpy, strcat, and for arguments to printf and scanf that use the %s and %[ conversion specifiers.
By contrast,
char arr[] = { 'h', 'e', 'l', 'l', 'o' };
stores a sequence of character values, but since there's no terminating 0-valued byte, this sequence is not considered a string, and you would not want to use it as an argument to any string-handling function (since there's no terminator, the function has no way of knowing where the string ends and will wind up attempting to access memory outside of the array, which can lead to anything from garbage output to a crash).
Without seeing the contents of lab8.h, I'm going to speculate that the string type is a typedef for an array of char, something like
#define MAX_STRING_LENGTH 20 // or some other value
typedef char string[MAX_STRING_LENGTH];
Thus, an array of string is an array of arrays of char; it would be equivalent to
char words[MAX_COUNT][MAX_STRING_LENGTH];
So each words[i] is an N-element array of char.
Consider following code:
// hacky, since "123" is 4 chars long (including terminating 0)
char symbols[3] = "123";
// clean, but lot of typing
char symbols[3] = {'1', '2', '3'};
so, the twist is actually described in comment to the code, is there a way to initialize char[] with string literal without terminating zero?
Update: seems like IntelliSense is wrong indeed, this behaviour is explicitly defined in C standard.
This
char symbols[3] = "123";
is a valid statement.
According to the ANSI C Specification of 1988:
An array of character type may be initialized by a character string
literal, optionally enclosed in braces. Successive characters of the
character string literal (including the terminating null character if
there is room or if the array is of unknown size) initialize the
members of the array.
Therefore, what you're doing is technically fine.
Note that character arrays are an exception to the stated constraints on initializers:
There shall be no more initializers in an initializer list than there
are objects to be initialized.
However, the technical correctness of a piece of code is only a small part of that code's "goodness". The line char symbols[3] = "123"; will immediately strike the veteran programmer as suspect because it appears, at face value, to be a valid string initialization and later may be used as such, leading to unexpected errors and certain death.
If you wish to go this route you should be sure it's what you really want. Saving that extra byte is not worth the trouble this could get you into. The NULL symbol, if anything, allows you to write better, more flexible code because it provides an unambiguous (in most instances) way of terminating the array.
(Draft specification available here.)
To co-opt Rudy's comment elsewhere on this page, the C99 Draft Specification's 32nd Example in §6.7.8 (p. 130) states that the lines
char s[] = "abc", t[3] = "abc";
are identical to
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
From which you can deduce the answer you're looking for.
The C99 specification draft can be found here.
If your array is only 3 chars long, the first line of code is identical to the second line. The '\0' at the end of the string will simply not be stored. IOW, there is nothing "dirty" or "wrong" with it.
1) The problems you are mentioning are not problems.
2) Que: Is there a way to initialize char[] with string literal without terminating zero? -- you are already doing that.