Representation of C string at memory and comparison

Representation of C string at memory and comparison - c

I have such code:
char str1[100] = "Hel0lo";
char *p;
for (p = str1; *p != 0; p++) {
cout << *p << endl;
/* skip along till the end */
}
and there are some parts not clear for me.
I understand that null-terminated string at memory is byte with all bits equal to 0 (ASCII). That's why when *p != 0 we decide that we found the end of the string. If I would like to search till zero char, I should compare with 48, which is DEC representation of 0 according to ASCII at memory.
But why while access to memory we use HEX numbers and for comparison we use DEC numbers?
Is it possible to compare "\0" as the end of string? Something like this(not working):
for (p = str1; *p != "\0"; p++) {
And as I understand "\48" is equal to 0?

Your loop includes the exit test
*p != "\0"
This takes the value of the char p, promotes it to int then compares this against the address of the string literal "\0". I think you meant to compare against the nul character '\0' instead
*p != '\0'
Regarding comparison against hex, decimal or octal values - there are no set rules, you can use them interchangably but should try to use whatever seems makes your code easiest to read. People often use hex along with bitwise operations or when handling binary data. Remember however that '0' is identical to 48, x30 and '\060' but different from '\0'.

Yes you can compare end of string like:
for (p = str1; *p != '\0'; p++) {
// your code
}
ASCII value of \0 char is 0 (zero)
you Could just do
for (p = str1; *p ; p++) {
// your code
}
As #Whozraig also commented, because *p is \0 and ASCII value is 0 that is false

Related

Why do we need to check string length larger than 0?

I got this example from CS50. I know that we need to check "s == NULL" in case there is no memory in the RAM. But, I am not sure why do we need to check the string length of t before capitalize.
#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
// Get a string
char *s = get_string("s: ");
if (s == NULL)
{
return 1;
}
// Allocate memory for another string
char *t = malloc(strlen(s) + 1);
if (t == NULL)
{
return 1;
}
// Copy string into memory
strcpy(t, s);
// Why do we need to check this condition?
if (strlen(t) > 0)
{
t[0] = toupper(t[0]);
}
// Print strings
printf("s: %s\n", s);
printf("t: %s\n", t);
// Free memory
free(t);
return 0;
}
Why do we need to use "if (strlen(t) > 0)" before capitalize?

Conceptually, there is no character to uppercase when the string is empty.
Technically, it's not needed. The first character of an empty string is 0, and toupper(0) is 0.
Note that strlen(t) > 0 can also be written as t[0] != 0 or just t[0]. There's no need to actually calculate the length of the string to find out if it's an empty string.
Also, make sure to read chux's answer for a correction regarding signed char.

// Why do we need to check this condition?
There is no need for the check. A string of length 0 consists of only a null character and toupper('\0'); returns '\0'.
Advanced: There is a need for something else though.
char may act as a signed or unsigned char. If t[0] < 0, (maybe due to entering 'é') then toupper(negative) is undefined behavior (UB). toupper() is only defined for EOF, (some negative) and values in the unsigned char range.
A more valuable code change, though pedantic, would be to access the characters as if they were unsigned char, then call toupper().
// if (strlen(t) > 0) { t[0] = toupper(t[0]); }
t[0] = (char) toupper(((unsigned char*)t)[0]);
// or
t[0] = (char) toupper(*(unsigned char*)t));

For any string t, the valid indexes (of the actual characters in the string) will be 0 to strlen(t) - 1.
Using strlen(t) as index will be the index of the null-terminator (assuming that it's a "proper" null-terminated string).
If strlen(t) == 0 then t[0] will be the null-terminator. And doing toupper on the null-terminator makes no sense. This is what the check does, make sure that there is at least one actual character (beyond the null-terminator) in the string.
In other words: It check that the string isn't empty.

NUL character and static character arrays/string literals in C

I understand that strings are terminated by a NUL '\0' byte in C.
However, what I can't figure out is why a 0 in a string literal acts differently than a 0 in an char array created on the stack. When checking for NUL terminators in a literal, the zeros in the middle of the array are not treated as such.
For example:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
int main()
{
/* here, one would expect strlen to evaluate to 2 */
char *confusion = "11001";
size_t len = strlen(confusion);
printf("length = %zu\n", len); /* why is this == 5, as opposed to 2? */
/* why is the entire segment printed here, instead of the first two bytes?*/
char *p = confusion;
while (*p != '\0')
putchar(*p++);
putchar('\n');
/* this evaluates to true ... OK */
if ((char)0 == '\0')
printf("is null\n");
/* and if we do this ... */
char s[6];
s[0] = 1;
s[1] = 1;
s[2] = 0;
s[3] = 0;
s[4] = 1;
s[5] = '\0';
len = strlen(s); /* len == 2, as expected. */
printf("length = %zu\n", len);
return 0;
}
output:
length = 5
11001
is null
length = 2
Why does this occur?

The variable 'confusion' is a pointer to char of a literal string.
So the memory looks something like
[11001\0]
So when you print the variable 'confusion', it will print everything until first null character which is represented by \0.
Zeroes in 11001 are not null, they are literal zeroes since it is surrounded with double quotes.
However, in your char array assignment for variable 's', you are assigning a decimal value 0 to
char variable. When you do that, ASCII decimal value of 0 which is ASCII character value of NULL character gets assigned to it. So the the character array looks something like in the memory
[happyface, happyface, NULL]
ASCII character happyface has ASCII decimal value of 1.
So when you print, it will print everything up to first NULL and thus
the strlen is 2.
The trick here is understanding what really gets assigned to a character variable when a decimal value is assigned to it.
Try this code:
#include <stdio.h>
int
main(void)
{
char c = 0;
printf( "%c\n", c ); //Prints the ASCII character which is NULL.
printf( "%d\n", c ); //Prints the decimal value.
return 0;
}

You can view an ASCII Table (e.g. http://www.asciitable.com/) to check the exact value of character '0' and null

'0' and 0 are not the same value. (The first one is 48, usually, although technically the precise value is implementation-defined and it is considered very bad style to write 48 to refer to the character '0'.)
If a '0' terminated a character string, you wouldn't be able to put zeros in strings, which would be a bit... limiting.

How does "for ( ; p; ++p) p = tolower(*p);" work in c?

I'm fairly new to programming and was just wondering by why this code:
for ( ; *p; ++p) *p = tolower(*p);
works to lower a string case in c, when p points to a string?

In general, this code:
for ( ; *p; ++p) *p = tolower(*p);
does not
” works to lower a string case in c, when p points to a string?
It does work for pure ASCII, but since char usually is a signed type, and since tolower requires a non-negative argument (except the special value EOF), the piece will in general have Undefined Behavior.
To avoid that, cast the argument to unsigned char, like this:
for ( ; *p; ++p) *p = tolower( (unsigned char)*p );
Now it can work for single-byte encodings like Latin-1, provided you have set the correct locale via setlocale, e.g. setlocale( LC_ALL, "" );. However, note that very common UTF-8 encoding is not a single byte per character. To deal with UTF-8 text you can convert it to a wide string and lowercase that.
Details:
*p is an expression that denotes the object that p points to, presumably a char.
As a continuation condition for the for loop, any non-zero char value that *p denotes, has the effect of logical True, while the zero char value at the end of the string has the effect of logical False, ending the loop.
++p advances the pointer to point to the next char.

To unpick, let's assume p is a pointer to a char and just before the for loop, it points to the first character in a string.
In C, strings are typically modelled by a set of contiguous char values with a final 0 added at the end which acts as the null terminator.
*p will evaluate to 0 once the string null-terminator is reached. Then the for loop will exit. (The second expression in the for loop acts as the termination test).
++p advances to the next character in the string.
*p = tolower(*p) sets that character to lower case.

I do not understand strcmp results

this is my implementation of strcmp ,
#include <stdio.h>
#include <string.h>
int ft_strcmp(const char *s1, const char *s2)
{
while (*s1 == *s2)
{
if (*s1 == '\0')
return (0);
s1++;
s2++;
}
return (*s1 - *s2);
}
int main()
{
char s1[100] = "bon";
char s2[100] = "BONN";
char str1[100] = "bon";
char str2[100] = "n";
printf("%d\n", ft_strcmp(s1, s2));
printf("%d\n", ft_strcmp(str1, str2));
return (0);
}
from the book kernighan and Ritchie but i use a while loop, instead of the for, i ve tested it many times and my strcmp geaves the same results as the original strcmp,
but i do not understand the results , i rode the man:
"The strcmp() and strncmp() functions lexicographically compare the null-terminated strings s1 and s2."
what does lexicography means ?
"return an integer greater than, equal to, or less than 0, according as the string s1 is greater than, equal to, or less than the string s2."
i understand this part but my questions are how can it come up with such results:
32
-12
s1 looks < s2 for me so how and why do i get 32 and how the calcul is made ?
str1 looks > str2 for me, how and why do i get -12 and how the calcul is made.
I ve compile it with the real STRCMP and i get the Same results..
last question why do i need to compare *s1 to '\0' won't it work fine without ?
thank you for your answers i m confused..

1) K&R are comparing the ascii values of those chars, that's why you get 32 and -12, check out an ascii table and you'll understand.
2)If you don't check for \0 , how can you know when the string end? That's the c strings terminator.

Capital letters in terms of ASCII codes actually precede lowercase letters, as you can see here.
So in terms of lexicographic ordering, s1 is treated as being bigger than s2, because the ascii value of the first letter that differs is the larger one.

SO we compare *s1 to '\0' to see when does the string ends,
and the results are made using the decimal value of the first characteres of each string.

int ft_strcmp(char *s1,char *s2)
{
int x;
x = 0;
while(s1[x] != '\0' && s2[x] != '\0' && s1[x] == s2[x])
i++;
return (s1[x] - s2[x]);
}
by mokgohloa ally

Why is strcmp not returning 0 in this context?

So I'm reading in chars one by one from a file:
char temp[3];
temp[0] = nextchar;
printf("%c",temp[0]); //prints %
temp[1] = nextchar = fgetc(srcptr);
printf("%c",temp[1]); //prints 2
temp[2] = nextchar = fgetc(srcptr);
printf("%c",temp[2]); //prints 0
if(strcmp(temp, "%20") == 0) { printf("%s","blahblah"); }
Ideally this should print "blahblah" at the end. However, it doesn't. So why is strcmp returning 0, and more importantly: how can I fix it?

You need to null terminate temp.
EDIT
Change char temp[3]; to char temp[4]; temp[3] = 0;

Use memcmp instead, because strcmp expects both strings to be '\0'-terminated (and temp is not):
if(memcmp(temp, "%20", sizeof(temp)) == 0) { printf("%s","blahblah"); }

A string is an array of characters, ending with the '\0' character. Since your tmp array can hold three characters and none of them is the terminating null character, strcmp (and any other string function) will think it continues further, reading memory past the allocated space until it encounters a null character (or crashes as it tires to read a restricted memory space).
The string "%20" is really the characters: '%', '2', '0', '\0'
So the easiest way to fix it is to declare tmp one larger and assign '\0' to the last element:
char tmp[4];
...
tmp[3] = '\0';

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Representation of C string at memory and comparison - c

Yes you can compare end of string like: for (p = str1; p != '\0'; p++) { // your code } ASCII value of \0 char is 0 (zero) you Could just do for (p = str1; p ; p++) { // your code } As #Whozraig also commented, because *p is \0 and ASCII value is 0 that is false

Related

Why do we need to check string length larger than 0?

NUL character and static character arrays/string literals in C

How does "for ( ; p; ++p) p = tolower(*p);" work in c?

I do not understand strcmp results

Why is strcmp not returning 0 in this context?

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Representation of C string at memory and comparison - c

Yes you can compare end of string like: for (p = str1; *p != '\0'; p++) { // your code } ASCII value of \0 char is 0 (zero) you Could just do for (p = str1; *p ; p++) { // your code } As #Whozraig also commented, because *p is \0 and ASCII value is 0 that is false

Related

Why do we need to check string length larger than 0?

NUL character and static character arrays/string literals in C

How does "for ( ; *p; ++p) *p = tolower(*p);" work in c?

I do not understand strcmp results

Why is strcmp not returning 0 in this context?

Categories

Resources

Yes you can compare end of string like: for (p = str1; p != '\0'; p++) { // your code } ASCII value of \0 char is 0 (zero) you Could just do for (p = str1; p ; p++) { // your code } As #Whozraig also commented, because *p is \0 and ASCII value is 0 that is false

How does "for ( ; p; ++p) p = tolower(*p);" work in c?