What is the difference between chars and strings? - c

I am working on cs50 coding exercise but I don't get strings and chars. I have problems with them cause I can't understand what they are. I don't know the difference between chars and strings cause they seem the same to me. My code is:
#include<stdio.h>
int main(void){
char c=A;
printf("%c", c);
string a = A;
printf("%s", a);
}
but it prints AA without changes whether it is a char or string.
If I do this:
#include<stdio.h>
int main(void){
char c=A;
printf("%c", c);
char a = A;
printf("%c", a);
}
it still prints AA. Even if I do this:
#include<stdio.h>
int main(void){
string c=A;
printf("%s", c);
string a = A;
printf("%s", a);
}
It STILL prints AA, even if I swap from chars to strings. I don't see a difference at all! Please help me understand.
I change from chars to strings but the result doesn't change. Please help me figure it out.

A char represents a single character. A string is a series of characters.

string is Alias of String in which we can store collection of Char Called Word like
string name = "mukesh"; but in char we can store single character only like
char ch='b';

A string is an array of chars.
A char is just one character or letter in english.
Like examples of chars are 'A', 'B', 'w', '3', '\n' etc.
In C/C++, a char takes 1 byte of space and is within single quotes.
Special characters like '\n', '\0' are also chars.
They take 1 byte of storage and can be represented as an integer also. Search ASCII on google to understand the integer relation.
string is a collection of chars.
Examples are "John", "Hello", "What is an Apple?".
The chars inside string "John" are J,o,h,n,\0. The length of the string is 5. A string always terminates with a \0 char.
strings are inside double quotes in C/C++.

There are a few possible common-use definitions of string (the C standard defines "A string is a contiguous sequence of characters terminated by and including the first null character.", see C11 7.1.1)
a) an array of char values of which the last one has value 0, eg
(char[4]){'b', 'a', 'r', '\0'} // a) string
b) an array of char values that contains a '\0', eg
(char[7]){'f', 'o', 'o', '\0', 'x', 'y', 'z'} // not an a) string; b) string
this last string can "grow" because the underlying array has enough space
c) a pointer to one of the above or to somewhere in the middle of such an array, eg
char array[10] = "foo\0bar\0X"; // not an a) string
char *p = &arr[4]; // not an a) string, not a b) string
This last array is not a string for definition a), it is a string for definition 2) string; p points to the 'b': that pointer is a string by definition c).
Always make sure you know what kind of string you're discussing!!
Of these definitions, c) is the one farthest from the C Standard definition.

Related

'\0' and printf() in C

In an introductory course of C, I have learned that while storing the strings are stored with null character \0 at the end of it. But what if I wanted to print a string, say printf("hello") although I've found that that it doesn't end with \0 by following statement
printf("%d", printf("hello"));
Output: 5
but this seem to be inconsistent, as far I know that variable like strings get stored in main memory and I guess while printing something it might also be stored in main memory, then why the difference?
The null byte marks the end of a string. It isn't counted in the length of the string and isn't printed when a string is printed with printf. Basically, the null byte tells functions that do string manipulation when to stop.
Where you will see a difference is if you create a char array initialized with a string. Using the sizeof operator will reflect the size of the array including the null byte. For example:
char str[] = "hello";
printf("len=%zu\n", strlen(str)); // prints 5
printf("size=%zu\n", sizeof(str)); // prints 6
printf returns the number of the characters printed. '\0' is not printed - it just signals that the are no more chars in this string. It is not counted towards the string length as well
int main()
{
char string[] = "hello";
printf("szieof(string) = %zu, strlen(string) = %zu\n", sizeof(string), strlen(string));
}
https://godbolt.org/z/wYn33e
sizeof(string) = 6, strlen(string) = 5
Your assumption is wrong. Your string indeed ends with a \0.
It contains of 5 characters h, e, l, l, o and the 0 character.
What the "inner" print() call outputs is the number of characters that were printed, and that's 5.
In C all literal strings are really arrays of characters, which include the null-terminator.
However, the null terminator is not counted in the length of a string (literal or not), and it's not printed. Printing stops when the null terminator is found.
All answers are really good but I would like to add another example to complete all these
#include <stdio.h>
int main()
{
char a_char_array[12] = "Hello world";
printf("%s", a_char_array);
printf("\n");
a_char_array[4] = 0; //0 is ASCII for null terminator
printf("%s", a_char_array);
printf("\n");
return 0;
}
For those don't want to try this on online gdb, the output is:
Hello world
Hell
https://linux.die.net/man/3/printf
Is this helpful to understand what escape terminator does? It's not a boundary for a char array or a string. It's the character that will say to the guy that parses -STOP, (print) parse until here.
PS: And if you parse and print it as a char array
for(i=0; i<12; i++)
{
printf("%c", a_char_array[i]);
}
printf("\n");
you get:
Hell world
where, the whitespace after double l, is the null terminator, however, parsing a char array, will just the char value of every byte. If you do another parse and print the int value of each byte ("%d%,char_array[i]), you'll see that (you get the ASCII code- int representation) the whitespace has a value of 0.
In C function printf() returns the number of character printed, \0 is a null terminator which is used to indicate the end of string in c language and there is no built in string type as of c++, however your array size needs to be a least greater than the number of char you want to store.
Here is the ref: cpp ref printf()
But what if I wanted to print a string, say printf("hello") although
I've found that that it doesn't end with \0 by following statement
printf("%d", printf("hello"));
Output: 5
You are wrong. This statement does not confirm that the string literal "hello" does not end with the terminating zero character '\0'. This statement confirmed that the function printf outputs elements of a string until the terminating zero character is encountered.
When you are using a string literal as in the statement above then the compiler
creates a character array with the static storage duration that contains elements of the string literal.
So in fact this expression
printf("hello")
is processed by the compiler something like the following
static char string_literal_hello[] = { 'h', 'e', 'l', 'l', 'o', '\0' };
printf( string_literal_hello );
Th action of the function printf in this you can imagine the following way
int printf( const char *string_literal )
{
int result = 0;
for ( ; *string_literal != '\0'; ++string_literal )
{
putchar( *string_literal );
++result;
}
return result;
}
To get the number of characters stored in the string literal "hello" you can run the following program
#include <stdio.h>
int main(void)
{
char literal[] = "hello";
printf( "The size of the literal \"%s\" is %zu\n", literal, sizeof( literal ) );
return 0;
}
The program output is
The size of the literal "hello" is 6
You have to clear your concept first..
As it will be cleared when you deal with array, The print command you are using its just counting the characters that are placed within paranthesis. Its necessary in array string that it will end with \0
A string is a vector of characters. Contains the sequence of characters that form the
string, followed by the special ending character
string: '\ 0'
Example:
char str[10] = {'H', 'e', 'l', 'l', 'o', '\0'};
Example: the following character vector is not one string because it doesn't end with '\ 0'
char str[2] = {'h', 'e'};

How to print char array in C, like 'Arrays.ToString(array)' of Java?

I want to print char array in C like Arrays.ToString(array); of Java does. It prints what I want but puts some characters at the end. I guess it's because of the special character \0.
I declared a char array char letters[] = {'g','y','u','c','n','e'};
And tried to print: printf("\n [%s]:", letters);
The output is: [gyucneÇ_=]
Here is the Java code:
char[] letters= {'g','y','u','c','n','e'};
System.out.print( Arrays.toString(letters) );
The output is:
[g, y, u, c, n, e]
I wanted to have the output of Java code. I wonder if I want it to contain commas too, do I have to print the characters one by one or can I print it at once ?
And of course my priority is to remove the special character that is printed at the end of C code.
Print each letter on its own. You do not have a string. You cannot call most functions from <string.h> or printf() or a bunch of others that expect a string.
char letters[] = {'g', 'y', 'u', 'c', 'n', 'e'}; // ATTENTION: letters is not a string!
for (int i = 0; i < sizeof letters; i++) {
putchar(letters[i]);
}
putchar('\n'); // end with a newline
I declared a char array: char letters[] = {'g','y','u','c','n','e'};
But that is not a C string (since it is not NUL terminated ! ). You should have coded instead:
const char letters[] = {'g','y','u','c','n','e',(char)0};
(or use '\0' instead of (char)0 ....) or better yet:
const char letters[] = "gyucne";
and both are exactly equivalent.
Then you can code something like printf("letters are %s\n", letters); since your letters are now a C string.
NB. Please read also http://utf8everywhere.org/ & How to debug small programs - both are practically very relevant for your case. See also at least some C reference site.

Maximum number of elements that can be stored in an array in c

Please pardon me if it is a copy question. I will be happy to delete it if pointed out.
The question is that, if I declare a character array in c, say
char character_array[4];
Does that mean I can only store 3 characters and one '/0' is added as the fourth character? But I have tried it and successfully added four characters into the character array. But when I do that where is the '/0' added since I have already used up the four positions?
Well, yes, you can store any four characters. The string-termination character '\0' is a character just like any other.
But you don't have to store strings, char is a small integer so you can do:
char character_array[] = { 1, 2, 3, 4 };
This uses all four elements, but doesn't store printable characters nor any termination; the result is not a C string.
If you want to store a string, you need to accommodate the terminator character of course, since C strings by definition always end with the termination character.
C does not have protection against buffer overflow, if you aim at your foot and pull the trigger it will, in general, happily blow it off for you. Some of us like this. :)
You mix two notions: the notion of arrays and the notion of strings.
In this declaration
char character_array[4];
there is declared an array that can store 4 objects of type char. It is not important what values the objects will have.
On the other hand the array can contain a string: a sequence of characters limited with a terminating zero.
For example you can initialize the array above in C the following way
char character_array[4] = { 'A', 'B', 'C', 'D' };
or
char character_array[4] = "ABCD";
or
char character_array[4] = { '\0' };
or
char character_array[4] = "";
and so on.
In all these cases the array has 4 objects of type char. In the last two cases you may suppose that the array contains strings (empty strings) because the array has an element with zero character ( '\0' ). That is in the last two cases you may apply to the array functions that deal with strings.
Or another example
char character_array[4] = { 'A', 'B', '\0', 'C' };
You can deal with the array as if it had a string "AB" or just four objects.
Consider this demonstrative program
#include <stdio.h>
#include <string.h>
int main( void )
{
char character_array[4] = { 'A', 'B', '\0', 'C' };
char *p = strchr(character_array, 'C');
if (p == NULL)
{
printf("Character '%c' is not found in the array\n", 'C');
}
else
{
printf("Character '%c' is found in the array at position %zu\n",
'C',
(size_t)(p - character_array));
}
p = ( char * )memchr(character_array, 'C', sizeof(character_array));
if (p == NULL)
{
printf("Character '%c' is not found in the array\n", 'C');
}
else
{
printf("Character '%c' is found in the array at position %zu\n",
'C',
(size_t)(p - character_array));
}
}
The program output is
Character 'C' is not found in the array
Character 'C' is found in the array at position 3
In the first part of the program it is assumed that the array contains a string. The standard string function strchr just ignores all elements of the array after encountering the element with the value '\0'.
In the second part of the program it is assumed that the array contains a sequence of objects with the length of 4. The standard function memchr knows nothing about strings.
Conclusion.
This array
char character_array[4];
can contain 4 objects of type character. It is so declared.
The array can contain a string if to interpret its content as a string provided that at least one element of the array is equal to '\0'.
For example if to declare the array like
char character_array[4] = "A";
that is equivalent to
char character_array[4] = { 'A', '\0', '\0', '\0' };
then it may be said that the array contains the string "A" with the length equal to 1. On the other hand the array actually contain 4 object of type char as the second equivalent declaration shows.
You just reserve 4 bytes to fill with. If you write to _array[4] (the fifth character) you have a so called buffer overflow, means you write to non-reserved memory.
If you store a string in 4 characters, you have actually just 3 characters for printable characters (_array[0], ..., _array[2]) and the last one (_array[3]) is just for keeping the string termination '\0'.
For instance, in your case the function strlen() parses until such string termination '\0' and returns length=3.

When/Why is '\0' necessary to mark end of an (char) array?

So I just read an example of how to create an array of characters which represent a string.
The null-character \0 is put at the end of the array to mark the end of the array. Is this necessary?
If I created a char array:
char line[100];
and put the word:
"hello\n"
in it, the chars would be placed at the first six indexes line[0] - line[6], so the rest of the array would be filled with null characters anyway?
This books says, that it is a convention that, for example the string constant "hello\n" is put in a character array and terminated with \0.
Maybe I don't understand this topic to its full extent and would be glad for enlightenment.
The \0 character does not mark the "end of the array". The \0 character marks the end of the string stored in a char array, if (and only if) that char array is intended to store a string.
A char array is just a char array. It stores independent integer values (char is just a small integer type). A char array does not have to end in \0. \0 has no special meaning in a char array. It is just a zero value.
But sometimes char arrays are used to store strings. A string is a sequence of characters terminated by \0. So, if you want to use your char array as a string you have to terminate your string with a \0.
So, the answer to the question about \0 being "necessary" depends on what you are storing in your char array. If you are storing a string, then you will have to terminate it with a \0. If you are storing something that is not a string, then \0 has no special meaning at all.
'\0' is not required if you are using it as character array. But if you use character array as string, you need to put '\0'. There is no separate string type in C.
There are multiple ways to declare character array.
Ex:
char str1[] = "my string";
char str2[64] = "my string";
char str3[] = {'m', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g', '\0'};
char str4[64] = {'m', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g' };
All these arrays have the same string "my string". In str1, str2, and str4, the '\0' character is added automatically, but in str3, you need to explicitly add the '\0' character.
(When the size of an array is explicitly declared, and there are fewer items in the initializer list than the size of the array, the rest of the array is initialized with however many zeros it takes to fill it -- see C char array initialization and The N_ELEMENTS macro .).
When/Why is '\0' necessary to mark end of an (char) array?
The terminating zero is necessary if a character array contains a string. This allows to find the point where a string ends.
As for your example that as I think looks the following way
char line[100] = "hello\n";
then for starters the string literal has 7 characters. It is a string and includes the terminating zero. This string literal has type char[7]. You can imagine it like
char no_name[] = { 'h', 'e', 'l', 'l', 'o', '\n', '\0' };
When a string literal is used to initialize a character array then all its characters are used as initializers. So relative to the example the seven characters of the string literal are used to initialize first 7 elements of the array. All other elements of the array that were not initialized by the characters of the string literal will be initialized implicitly by zeroes.
If you want to determine how long is the string stored in a character array you can use the standard C function strlen declared in the header <string.h>. It returns the number of characters in an array before the terminating zero.
Consider the following example
#include <stdio.h>
#include <string.h>
int main(void)
{
char line[100] = "hello\n";
printf( "The size of the array is %zu"
"\nand the length of the stored string \n%s is %zu\n",
sizeof( line ), line, strlen( line ) );
return 0;
}
Its output is
The size of the array is 100
and the length of the stored string
hello
is 6
In C you may use a string literal to initialize a character array excluding the terminating zero of the string literal. For example
char line[6] = "hello\n";
In this case you may not say that the array contains a string because the sequence of symbols stored in the array does not have the terminating zero.
You need the null character to mark the end of the string. C does not store any internal information about the length of the character array or the length of a string, and so the null character/byte \0 marks where it ends.
This is only required for strings, however – you can have any ordinary array of characters that does not represent a string.
For example, try this piece of code:
#include <stdio.h>
int main(void) {
char string[1];
string[0] = 'a';
printf("%s", string);
}
Note that the character array is completely filled with data. Thus, there is no null byte to mark the end. Now, printf will keep printing until it hits a null byte – this will be somewhere past the end of the array, so you will print out a lot of junk in addition to just "a".
Now, try this:
#include <stdio.h>
int main(void) {
char string[2];
string[0] = 'a';
string[1] = '\0';
printf("%s", string);
}
It will only print "a", because the end of the string is explicitly marked.
The length of a C string (an array containing the characters and terminated with a '\0' character) is found by searching for the (first) NUL byte. \0 is zero character. In C it is mostly used to indicate the termination of a character string.
I make an example to you:
let's say you've written a word into a file:
word = malloc(sizeof(cahr) * 6);
word = "Hello";
fwrite(word, sizeof(char), 6, fp);
where in word we allocate space for the 5 character of "Hello" plus one more for its terminating '\0'. The fp is the file.
An now, we write another word after the last one:
word2 = malloc(sizeof(cahr) * 7);
word2 = "world!";
fwrite(word2, sizeof(char), 7, fp);
So now, let's read the two words:
char buff = malloc(sizeof(char)*1000); // See that we can store as much space as we want, it won't change the final result
/* 13 = (5 chacater from 'Hello')+(1 character of the \0)+(6 characters from 'world!')+(1 character from the \0) */
fread(buff, sizeof(char), 13, fp); // We read the words 'Hello\0' and 'world!\0'
printf("the content of buff is: %s", buff); // This would print 'Hello world!'
This last is due to the ending \0 character, so C knows there are two separated strings into buffer. If we had not put that \0 character at the end of both words, and repeat the same example, the output would be "Helloworld!"
This can be used for many string methods and functions!.

Why is the entirety of this first array being added onto the second, on top of the two values (from the first) that I assign it?

I want to assign the first two values from the hash array to the salt array.
char hash[] = {"HAodcdZseTJTc"};
char salt[] = {hash[0], hash[1]};
printf("%s", salt);
However, when I attempt this, the first two values are assigned and then all thirteen values are also assigned to the salt array. So my output here is not:
HA
but instead:
HAHAodcdZseTJTC
salt is not null-terminated. Try:
char salt[] = {hash[0], hash[1], '\0'};
Since you are adding just two characters to the salt array and you are not adding the '\0' terminator.
Passing a non nul terminated array as a parameter to printf() with a "%s" specifier, causes undefined behavior, in your case it prints hash in my case
HA#
was printed.
Strings in c use a special convetion to know where they end, a non printable special character '\0' is appended at the end of a sequence of non-'\0' bytes, and that's how a c string is built.
For example, if you were to compute the length of a string you would do something like
size_t stringlength(const char *string)
{
size_t length;
for (length = 0 ; string[length] != '\0' ; ++length);
return length;
}
there are of course better ways of doing it, but I just want to illustrate what the significance of the terminating '\0' is.
Now that you know this, you should notice that
char string[] = {'A', 'B', 'C'};
is an array of char but it's not a string, for it to be a string, it needs a terminating '\0', so
char string[] = {'A', 'B', 'C', '\0'};
would actually be a string.
Notice that then, when you allocate space to store n characters, you need to allocate n + 1 bytes, to make room for the '\0'.
In the case of printf() it will try to consume all the bytes that the passed pointer points at, until one of them is '\0', there it would stop iterating through the bytes.
That also explains the Undefined Behavior thing, because clearly printf() would be reading out of bounds, and anything could happen, it depends on what is actually there at the memory address that does not belong the the passed data but is off bounds.
There are many functions in the standard library that expect strings, i.e. _sequences of non nul bytes, followed by a nul byte.

Resources