NUL character and static character arrays/string literals in C - c

I understand that strings are terminated by a NUL '\0' byte in C.
However, what I can't figure out is why a 0 in a string literal acts differently than a 0 in an char array created on the stack. When checking for NUL terminators in a literal, the zeros in the middle of the array are not treated as such.
For example:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
int main()
{
/* here, one would expect strlen to evaluate to 2 */
char *confusion = "11001";
size_t len = strlen(confusion);
printf("length = %zu\n", len); /* why is this == 5, as opposed to 2? */
/* why is the entire segment printed here, instead of the first two bytes?*/
char *p = confusion;
while (*p != '\0')
putchar(*p++);
putchar('\n');
/* this evaluates to true ... OK */
if ((char)0 == '\0')
printf("is null\n");
/* and if we do this ... */
char s[6];
s[0] = 1;
s[1] = 1;
s[2] = 0;
s[3] = 0;
s[4] = 1;
s[5] = '\0';
len = strlen(s); /* len == 2, as expected. */
printf("length = %zu\n", len);
return 0;
}
output:
length = 5
11001
is null
length = 2
Why does this occur?

The variable 'confusion' is a pointer to char of a literal string.
So the memory looks something like
[11001\0]
So when you print the variable 'confusion', it will print everything until first null character which is represented by \0.
Zeroes in 11001 are not null, they are literal zeroes since it is surrounded with double quotes.
However, in your char array assignment for variable 's', you are assigning a decimal value 0 to
char variable. When you do that, ASCII decimal value of 0 which is ASCII character value of NULL character gets assigned to it. So the the character array looks something like in the memory
[happyface, happyface, NULL]
ASCII character happyface has ASCII decimal value of 1.
So when you print, it will print everything up to first NULL and thus
the strlen is 2.
The trick here is understanding what really gets assigned to a character variable when a decimal value is assigned to it.
Try this code:
#include <stdio.h>
int
main(void)
{
char c = 0;
printf( "%c\n", c ); //Prints the ASCII character which is NULL.
printf( "%d\n", c ); //Prints the decimal value.
return 0;
}

You can view an ASCII Table (e.g. http://www.asciitable.com/) to check the exact value of character '0' and null

'0' and 0 are not the same value. (The first one is 48, usually, although technically the precise value is implementation-defined and it is considered very bad style to write 48 to refer to the character '0'.)
If a '0' terminated a character string, you wouldn't be able to put zeros in strings, which would be a bit... limiting.

Related

Why do we need to check string length larger than 0?

I got this example from CS50. I know that we need to check "s == NULL" in case there is no memory in the RAM. But, I am not sure why do we need to check the string length of t before capitalize.
#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
// Get a string
char *s = get_string("s: ");
if (s == NULL)
{
return 1;
}
// Allocate memory for another string
char *t = malloc(strlen(s) + 1);
if (t == NULL)
{
return 1;
}
// Copy string into memory
strcpy(t, s);
// Why do we need to check this condition?
if (strlen(t) > 0)
{
t[0] = toupper(t[0]);
}
// Print strings
printf("s: %s\n", s);
printf("t: %s\n", t);
// Free memory
free(t);
return 0;
}
Why do we need to use "if (strlen(t) > 0)" before capitalize?
Conceptually, there is no character to uppercase when the string is empty.
Technically, it's not needed. The first character of an empty string is 0, and toupper(0) is 0.
Note that strlen(t) > 0 can also be written as t[0] != 0 or just t[0]. There's no need to actually calculate the length of the string to find out if it's an empty string.
Also, make sure to read chux's answer for a correction regarding signed char.
// Why do we need to check this condition?
There is no need for the check. A string of length 0 consists of only a null character and toupper('\0'); returns '\0'.
Advanced: There is a need for something else though.
char may act as a signed or unsigned char. If t[0] < 0, (maybe due to entering 'é') then toupper(negative) is undefined behavior (UB). toupper() is only defined for EOF, (some negative) and values in the unsigned char range.
A more valuable code change, though pedantic, would be to access the characters as if they were unsigned char, then call toupper().
// if (strlen(t) > 0) { t[0] = toupper(t[0]); }
t[0] = (char) toupper(((unsigned char*)t)[0]);
// or
t[0] = (char) toupper(*(unsigned char*)t));
For any string t, the valid indexes (of the actual characters in the string) will be 0 to strlen(t) - 1.
Using strlen(t) as index will be the index of the null-terminator (assuming that it's a "proper" null-terminated string).
If strlen(t) == 0 then t[0] will be the null-terminator. And doing toupper on the null-terminator makes no sense. This is what the check does, make sure that there is at least one actual character (beyond the null-terminator) in the string.
In other words: It check that the string isn't empty.

Assignment after initialization to specific index in an array

After assigning 26th element, when printed, still "Computer" is printed out in spite I assigned a character to 26th index. I expect something like this: "Computer K "
What is the reason?
#include <stdio.h>
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); /*prints out "Computer"*/
m1[26] = 'K';
printf("%s\n", m1); /*prints out "Computer"*/
printf("%c", m1[26]); /*prints "K"*/
}
At 8th index of that string the \0 character is found and %s prints only till it finds a \0 (the end of string, marked by \0) - at 26th the character k is there but it will not be printed as \0 is found before that.
char s[100] = "Computer";
is basically the same as
char s[100] = { 'C', 'o', 'm', 'p', 'u','t','e','r', '\0'};
Since printf stops when the string is 0-terminated it won't print character 26
Whenever you partially initialize an array, the remaining elements are filled with zeroes. (This is a rule in the C standard, C17 6.7.9 §19.)
Therefore char m1[40] = "Computer"; ends up in memory like this:
[0] = 'C'
[1] = 'o'
...
[7] = 'r'
[8] = '\0' // the null terminator you automatically get by using the " " syntax
[9] = 0 // everything to zero from here on
...
[39] = 0
Now of course \0 and 0 mean the same thing, the value 0. Either will be interpreted as a null terminator.
If you go ahead and overwrite index 26 and then print the array as a string, it will still only print until it encounters the first null terminator at index 8.
If you do like this however:
#include <stdio.h>
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); // prints out "Computer"
m1[8] = 'K';
printf("%s\n", m1); // prints out "ComputerK"
}
You overwrite the null terminator, and the next zero that happened to be in the array is treated as null terminator instead. This code only works because we partially initialized the array, so we know there are more zeroes trailing.
Had you instead written
int main()
{
char m1[40];
strcpy(m1, "Computer");
This is not initialization but run-time assignment. strcpy would only set index 0 to 8 ("Computer" with null term at index 8). Remaining elements would be left uninitialized to garbage values, and writing m1[8] = 'K' would destroy the string, as it would then no longer be reliably null terminated. You would get undefined behavior when trying to print it: something like garbage output or a program crash.
In C strings are 0-terminated.
Your initialization fills all array elements after the 'r' with 0.
If you place a non-0 character in any random field of the array, this does not change anything in the fields before or after that element.
This means your string is still 0-terminated right after the 'r'.
How should any function know that after that string some other string might follow?
That's because after "Computer" there's a null terminator (\0) in your array. If you add a character after this \0, it won't be printed because printf() stops printing when it encounters a null terminator.
Just as an addition to the other users answers - you should try to answer your question by being more proactive in your learning. It is enough to write a simple program to understand what is happening.
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); /*prints out "Computer"*/
m1[26] = 'K';
for(size_t index = 0; index < 40; index++)
{
printf("m1[%zu] = 0x%hhx ('%c')\n", index, (unsigned char)m1[index], (m1[index] >=32) ? m1[index] : ' ');
}
}

My function goes over the length of string

I am trying to make function that compares all the letters from alphabet to string I insert, and prints letters I didn't use. But when I print those letters it goes over and gives me random symbols at end. Here is link to function, how I call the function and result: http://imgur.com/WJRZvqD,U6Z861j,PXCQa4V#0
Here is code: (http://pastebin.com/fCyzFVAF)
void getAvailableLetters(char lettersGuessed[], char availableLetters[])
{
char alphabet[]={'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'};
int LG,LG2,LA=0;
for (LG=0;LG<=strlen(alphabet)-1;LG++)
{
for(LG2=0;LG2<=strlen(lettersGuessed)-1;LG2++)
{
if (alphabet[LG]==lettersGuessed[LG2])
{
break;
}
else if(alphabet[LG]!=lettersGuessed[LG2] &&LG2==strlen(lettersGuessed)-1)
{
availableLetters[LA]=alphabet[LG];
LA++;
}
}
}
}
Here is program to call the function:
#include <stdio.h>
#include <string.h>
#include "hangman.c"
int main()
{
int i = 0;
char result[30];
char text[30];
scanf("%s", text);
while(i != strlen(text))
{
i++;
}
getAvailableLetters(text, result);
printf("%s\n", result);
printf ("%d", i);
printf ("\n");
}
Here is result when I typed in abcd: efghijklmnopqrstuvwxyzUw▒ˉ
If you want to print result as a string, you need to include a terminating null at the end of it (that's how printf knows when to stop).
for %s printf stops printing when it reaches a null character '\0', because %s expects the string to be null terminated, but result not null terminated and that's why you get random symbols at the end
just add availableLetters[LA] = '\0' at the last line in the function getAvailableLetters
http://pastebin.com/fCyzFVAF
Make sure your string is NULL-terminated (e.g. has a '\0' character at the end). And that also implies ensuring the buffer that holds the string is large enough to contain the null terminator.
Sometimes one thinks they've got a null terminated string but the string has overflowed the boundary in memory and truncated away the null-terminator. That's a reason you always want to use the form of functions (not applicable in this case) that read data, like, for example, sprintf() which should be calling snprintf() instead, and any other functions that can write into a buffer to be the form that let's you explicitly limit the length, so you don't get seriously hacked with a virus or exploit.
char alphabet[]={'a','b','c', ... ,'x','y','z'}; is not a string. It is simply an "array 26 of char".
In C, "A string is a contiguous sequence of characters terminated by and including the first null character. ...". C11 §7.1.1 1
strlen(alphabet) expects a string. Since code did not provide a string, the result is undefined.
To fix, insure alphabet is a string.
char alphabet[]={'a','b','c', ... ,'x','y','z', 0};
// or
char alphabet[]={"abc...xyz"}; // compiler appends a \0
Now alphabet is "array 27 of char" and also a string.
2nd issue: for(LG2=0;LG2<=strlen(lettersGuessed)-1;LG2++) has 2 problems.
1) Each time through the loop, code recalculates the length of the string. Better to calculate the string length once since the string length does not change within the loop.
size_t len = strlen(lettersGuessed);
for (LG2 = 0; LG2 <= len - 1; LG2++)
2) strlen() returns the type size_t. This is some unsigned integer type. Should lettersGuessed have a length of 0 (it might have been ""), the string length - 1 is not -1, but some very large number as unsigned arithmetic "wraps around" and the loop may never stop. A simple solution follows. This solution would only fail is the length of the string exceeded INT_MAX.
int len = (int) strlen(lettersGuessed);
for (LG2 = 0; LG2 <= len - 1; LG2++)
A solution without this limitation would use size_t throughout.
size_t LG2;
size_t len = strlen(lettersGuessed);
for (LG2 = 0; LG2 < len; LG2++)

static array of char not filled

I want to fill an array of char, one by one, so I am using the above code for testing, and the final output of "str" will always be the first char entered instead of all char, whats wrong ?
void gime_char(char c)
{
static *str;
static i;
if(i == 0)
str = malloc(sizeof(*str) * 10);
if(c == 'X')
{
printf("full str:%s\n", str);
}
printf("c == %c\n", c);
str[i] = c;
printf("d == %d\n", i);
i++;
}
int main()
{
char c;
while(c != 'X')
{
c = getchar();
gime_char(c);
}
}
The type of static *str is static int *str, but a string consists of chars. So now your string does not end up being stored with the characters adjacent to one another as you would expect, because each element of str has the size of int (probably 4 bytes), not that of char.
This part of the code should be fixed by specifying the type as static char *str. Once you fix that, there will be another problem when printing str without terminating it with a NUL character ('\0').
You're missing a type in the definition of i, and the type in the definition of str is incomplete: you say “pointer” (with the * character) but you don't say to what. For historical reasons, if you omit a type, the compiler assumes you meant int; this is deprecated, and good compilers warn about this.
Since you effectively wrote int *str, the memory layout looks something like this after entering hello (this is machine-dependent, but this is a pretty typical case):
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----
| 'h' | 0 | 0 | 0 | 'e' | 0 | 0 | 0 | 'l' | 0 | ...
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----
^ ^ ^
| | |
str str+1 str+2
Each small cell is one byte (which corresponds to one character). Four cells make up one int. The line
str[i] = c;
writes one int into str. For example, when c is 'h', which has the numerical value 104, the number 104 is written into the int object str[0], which is represented as the four-byte sequence {104, 0, 0, 0}. This happens again for the next character, which is written in the next int-sized slot in the array, meaning 4 bytes further.
In the line
printf("full str:%s\n", str);
you print str as a string. In a string, the first zero byte marks the end of the string. So you see the string "h".
Your machine is little-endian. Exercise: what would you see on a big-endian machine?
The fix is to declare the types properly.
You should also initialize your variables. static variables are initialized to 0 anyway, but it's clearer if you do it explicitly. The variable c is not static, so it starts out containing whichever value was there before in memory; this could happen to be 'X', so you must initialize it explicitly.
Additionally, you need to make sure that the string is terminated by a zero byte before printing it. The static keyword ensures that the str variable is initialized to a null pointer, but the space that the pointer points to is allocated by malloc and contains whatever was there before.
void gime_char(char c)
{
static char *str; /* <<<< */
static int i; /* <<<< */
if(i == 0)
str = malloc(sizeof(*str) * 10);
if(c == 'X')
{
str[i] = 0; /* <<<< */
printf("full str:%s\n", str);
}
printf("c == %c\n", c);
str[i] = c;
printf("d == %d\n", i);
i++;
}
int main()
{
char c = 0; /* <<<< */
while(c != 'X')
{
c = getchar();
gime_char(c);
}
return 0; /* <<<< */
}
Advices :
Check that malloc didn't return an error
i is not initialize
Type of *str is wrong (static char *str)
Before using %s you have to add a '\0' at the end of your string.
I would say your problem comes from fact that you declare static *str;.
You declare it with out specifying the type! Compilers pass with, with a warning that this implies an int. So basically you end up with a static pointer to int. Which is not meant to store ANSI strings.
Later you allocate space for the buffer (I suppose) with str = malloc(sizeof(*str) * 10);. This is also wrong, because you allocate space for 10 pointers to str.
You want to work with characters.
static char *str;
if(i == 0)
str = malloc(sizeof(char) * 10);
Also you should initialize the string to zeros as it is almost certain you will get some garbage there and ANSI string is expected to be NULL terminated. Preferably i too but compilers initialize variables on stack.

Representation of C string at memory and comparison

I have such code:
char str1[100] = "Hel0lo";
char *p;
for (p = str1; *p != 0; p++) {
cout << *p << endl;
/* skip along till the end */
}
and there are some parts not clear for me.
I understand that null-terminated string at memory is byte with all bits equal to 0 (ASCII). That's why when *p != 0 we decide that we found the end of the string. If I would like to search till zero char, I should compare with 48, which is DEC representation of 0 according to ASCII at memory.
But why while access to memory we use HEX numbers and for comparison we use DEC numbers?
Is it possible to compare "\0" as the end of string? Something like this(not working):
for (p = str1; *p != "\0"; p++) {
And as I understand "\48" is equal to 0?
Your loop includes the exit test
*p != "\0"
This takes the value of the char p, promotes it to int then compares this against the address of the string literal "\0". I think you meant to compare against the nul character '\0' instead
*p != '\0'
Regarding comparison against hex, decimal or octal values - there are no set rules, you can use them interchangably but should try to use whatever seems makes your code easiest to read. People often use hex along with bitwise operations or when handling binary data. Remember however that '0' is identical to 48, x30 and '\060' but different from '\0'.
Yes you can compare end of string like:
for (p = str1; *p != '\0'; p++) {
// your code
}
ASCII value of \0 char is 0 (zero)
you Could just do
for (p = str1; *p ; p++) {
// your code
}
As #Whozraig also commented, because *p is \0 and ASCII value is 0 that is false

Resources