understanding strlen function in C - c

I am learning C. And, I see this function find length of a string.
size_t strlen(const char *str)
{
size_t len = 0U;
while(*(str++)) ++len; return len;
}
Now, when does the loop exit? I am confused, since str++, always increases the pointer.

while(*(str++)) ++len;
is same as:
while(*str) {
++len;
++str;
}
is same as:
while(*str != '\0') {
++len;
++str;
}
So now you see when str points to the null char at the end of the string, the test condition fails and you stop looping.

C strings are terminated by the NUL character which has the value of 0
0 is false in C and anything else is true.
So we keep incrementing the pointer into the string and the length until we find a NUL and then return.

You need to understand two notions to grab the idea of the function :
1°) A C string is an array of characters.
2°) In C, an array variable is actually a pointer to the first case of the table.
So what strlen does ? It uses pointer arithmetics to parse the table (++ on a pointer means : next case), till it gets to the end signal ("\0").

Once *(str++) returns 0, the loop exits. This will happen when str points to the last character of the string (because strings in C are 0 terminated).

Correct, str++ increases the counter and returns the previous value. The asterisk (*) dereferences the pointer, i.e. it gives you the character value.
C strings end with a zero byte. The while loop exits when the conditional is no longer true, which means when it is zero.
So the while loop runs until it encounters a zero byte in the string.

Related

while loop with only parentheses syntax, in c

i just saw this "while(something);" syntax. i googled this but did not found anything. how does this work? especially second while in the example code confuses me.
this code is a program to concatenate two strings using pointer.
#include <stdio.h>
#define MAX_SIZE 100 // Maximum string size
int main()
{
char str1[MAX_SIZE], str2[MAX_SIZE];
char * s1 = str1;
char * s2 = str2;
/* Input two strings from user */
printf("Enter first string: ");
gets(str1);
printf("Enter second string: ");
gets(str2);
/* !!!!!!!!!!!!!!!!! this is it!!!!!!!!!!!!!!!!!!!! Move till the end of str1 */
while(*(++s1));
/* !!!!!!!!!!!!!!!!! this is it!!!!!!!!!!!!!!!!!!!! Copy str2 to str1 */
while(*(s1++) = *(s2++));
printf("Concatenated string = %s", str1);
return 0;
}
The while loop is defined in C the following way
while ( expression ) statement
In this while loop
while(*(++s1));
the statement is a null statement. (The C Standard, 6.8.3 Expression and null statements)
3 A null statement (consisting of just a semicolon) performs no
operations.
So in the above while loop the expression is evaluated cyclically until it logically becomes false.
Pay attention to that this while loop has a bug.;)
Let's assume that the pointed string is empty "". In memory it is represented the following way
{ '\0' }
So initially s1 points to the terminating zero.
But before dereferencing it is incremented in the expression of the while loop
while(*(++s1));
^^^^
and after that points in the uninitialized part of the character array after the terminating zero '\0'. So the loop can invoke undefined behavior.
It would be more correctly to rewrite it like
while( *s1 != '\0' ) ++s1;
In this case after the loop the pointer s1 will point to the terminating zero '\0' of the source string.
This while loop where the statement is again a null statement
while(*(s1++) = *(s2++));
can be rewritten the following way
while( ( *s1++ = *s2++ ) != '\0' );
that is in essence the same as
while( ( *s1 = *s2 ) != '\0' )
{
++s1;
++s2;
}
(except that if the terminating zero was encountered and copied the pointers are not incremented)
That is the result of the assignment ( *s1 = *s2 ) is the assigned character that is checked whether it is equal already to the terminating zero character '\0'. And if so the loop stops and it means that the string pointed to by the pointer s2 is appended to the string pointed to by the pointer s1.
Pay attention to that the function gets is unsafe and is not supported by the C Standard. Instead you should use the function fgets as for example
#include <string.h>
#include <stdio.h>
//...
printf("Enter first string: ");
fgets(str1, sizeof( str1 ), stdin );
str1[ strcspn( str1, "\n" ) ] = '\0';
The last statement is used to remove the new line character '\n' that can be appended to the entered string by the function call.
Also you need to check in the program whether there is enough space in the array str1 and the string stored in the array str2 can be indeed appended to the string stored in the array str1.
while(*(++s1)); is an obfuscated and bugged way of writing while(*s1 != '\0') { s1++; }.
(It should have been while(*(s1++)); to behave as expected, but that too is wrong since it increments the pointer upon failure and won't work with an empty string.)
while(*(s1++) = *(s2++)); is an obfuscated (and likely inefficient) way of writing strcpy(s1,s2);.
The whole program is an obfuscated way of writing strcat(s1, s2);. You can replace both of these buggy while loops with that single function call.
Generally while(something); is bad practice, to the point where compilers might even warn for it, since it isn't clear if the semicolon ended up there on purpose or by a slip of the finger. Preferred style is either:
while(something)
; // aha this was surely not placed there by accident
or
while(something){}
or
while(something)
{}
++s1 advances (or increments) the pointer, before the while checks it value
The while loop will iterate through the string until it will reach the null terminator, since while(NULL) is equal to while(false) or while(0)
The loop
while(*(++s1));
doesn't need a body because everything is done inside the loop condition.
Therefore the loop body is an empty statement ;.
The loop consists the following steps:
++s1 increment pointer
*(...) dereference pointer, i.e. get the data where the pointer points to.
use the value as the condition (0 is false, everything else is true)
The loop can be rewritten as
do
{
++s1;
}
while(*s1); // or while(*s1 != '\0');
Similarly, the other loop
while(*(s1++) = *(s2++));
can be written as
do
{
char c;
*s1 = *s2;
c = *s1;
s1++;
s2++;
}
while(c != '\0')
Note that the original loop condition contains an assignment (=), not a comparison (==). The assigned value is used as the loop condition.

Understanding character pointers in a while loop

I am learning C and a I came across this function in my study materials. The function accepts a string pointer and a character and counts the number of characters that are in the string. For example for a string this is a string and a ch = 'i' the function would return 3 for 3 occurrences of the letter i.
The part I found confusing is in the while loop. I would have expected that to read something like while(buffer[j] != '\0') where the program would cycle through each element j until it reads a null value. I don't get how the while loop works using buffer in the while loop, and how the program is incremented character by character using buffer++ until the null value is reached. I tried to use debug, but it doesn't work for some reason. Thanks in advance.
int charcount(char *buffer, char ch)
{
int ccount = 0;
while(*buffer != '\0')
{
if(*buffer == ch)
ccount++;
buffer++;
}
return ccount;
}
buffer is a pointer to a set of chars, a string, or a memory buffer holding char data.
*buffer will dereference the value at buffer, as a char. This can be compared with the null character.
When you add to buffer - you are adding to the address, not the value it points to, buffer++ adds 1 to the address, pointing to the next char. This means that now *buffer results in the next character.
In the loop you are incrementing the pointer buffer until it points to the null character, at which point you know you scanned the whole string. Instead of buffer[j], which is equivalent to *(buffer+j), we are incrementing the pointer itself.
When you say buffer++ you increment the address stored in buffer by one.
Once you internalize how pointers work, this code is cleaner than the code that uses a separate index to scan the character string.
In C and C++, arrays are stored in sequence, and an array is stored according to its first address and length.
Therefore *buffer is actually the address of the first byte, and is synonymous with buffer[0]. Because of this, you can use buffer as an array, like this:
int charcount(char *buffer, char ch)
{
int ccount = 0;
int charno = 0;
while(buffer[charno] != '\0')
{
if(buffer[charno] == ch)
ccount++;
charno++;
}
return ccount;
}
Note that this works because strings are null terminated - if you don't have a null termination in the character array pointed to by *buffer it will continue reading forever; you lose the bit where c knows how long the array is. This is why you see so many c functions to which you pass a pointer and a length - the pointer tells it the [0] position of the array, and the size you specify tells it how far to keep reading.
Hope this helps.

Don't understand how this for loop works

Can someone explain how this loop works? The entire function serves to figure out where in hash to place certain strings and the code is as follows:
//determine string location in hash
int hash(char* str)
{
int size = 100;
int sum;
for(; *str; str++)
sum += *str;
return sum % size;
}
It seems to iterate over the string character by character until it hits null, however why does simple *str works as a condition? Why does str++ moves to the next character, shouldn't it be something like this instead: *(str+i) where i increments with each loop and moves "i" places in memory based on *str address?
In C, chars and integers implicitly convert to booleans as: 0 - false, non-zero - true;
So for(; *str; str++) iterates until *str is zero. (or nul)
str is a pointer to an array of chars. str++ increments this pointer to point to the next element in the array and therefore the next character in the string.
So instead of indexing by index. You are moving the pointer.
The condition in a for loop is an expression that is tested for a zero value. The NUL character at the end of str is zero.
The more explicit form of this condition is of course *str != '\0', but that's equivalent since != produces zero when *str is equal to '\0'.
As for why str++ moves to the next character: that's how ++ is defined on pointers. When you increment a char*, you point it to the next char-sized cell in memory. Your *(str + i) solution would also work, it just takes more typing (even though it can be abbreviated str[i]).
This for loop makes use of pointer arithmetic. With that you can increment/decrement the pointer or add/substract an offset to it to navigate to certain entries in the array, since array are continuous blocks of memory you can do that.
str points to a string. Strings in C always end with a terminating \0.
*str dereferences the actual pointer to get the char value.
The for loop's break condition is equivalent to:
*str != '\0'
and
str++
moves the pointer forward to next element.
The hole for-loop is equivalent to:
int len = strlen(str);
int i;
for(i = 0; i < len; i++)
sum += str[i];
You could also write is as while-loop:
while(*str)
sum += *str++;
Why does str++ moves to the next character, shouldn't it be something like this
instead: *(str+i) where i increments with each loop and moves "i" places in
memory based on *str address?
In C/C++, string is a pointer variable that contains the address of your string literal.Initially Str points to the first character.*(str) returns the first character of string.
Str++ points to second charactes.Thus *(str) returns the second character of the string.
why does simple *str works as a condition?
Every c/c++ string contains null character.These Null Characters signify the end of a character string in C. ASCII code of NUL character is 0.
In C/C++,0 means FALSE.Thus, NUL Character in Conditional statement
means FALSE Condition.
for(;0;)/*0 in conditions means false, hence the loop terminates
when pointer points to Null Character.
{
}
It has to do with how C converts values to "True" and "False". In C, 0 is "False" and anything else is "True"
Since null (the character) happens to also be zero it evaluates to "False". If the character set were defined differently and the null character had a value of "11" then the above loop wouldn't work!
As for the 2nd half of the question, a pointer points to a "location" in memory. Incrementing that pointer makes it point to the next "location" in memory. The type of the pointer is relevant here too because the "Next" location depends on how big the thing being pointed to is
When the pointer points to a null character it is regarded as false. This happens in pointers. I don't know who defined it, but it happens.
It may be just becuase C treats 0 as false and every other things as true.
For example in the following code.
if(0) {
puts("true");
} else {
puts("false");
}
false will be the output
The unary * operator is a dereference operator -- *str means "the value pointed to by str." str is a pointer, so incrementing it with str++ (or ++str) changes the pointer to point to the next character. So it is the correct way to increment in the for loop.
Any integral value can be treated as a Boolean. *str as the condition of the for loop takes the value pointed to by str and determine if it is non-zero. If so, the loop continues Once it hits a null character, it terminates.

While (*s) - How does this work?

How does this while loop works? When this *s argument terminates?
void putstr (char *s)
{
while (*s) putchar(*s++);
}
So other notable behaviors, arguments for while?
Logical expressions in C evaluate to false if they are 0, otherwise they evaluate to true. Thus your loop will terminate when *s is equal to 0. In the context of a char that is when the null-terminating character is encountered.
Note that ++ has a higher precedence than pointer dereferencing * and so the ++ is bound to the pointer rather than the char to which it points. Thus the body of your loop will call putchar for the character that s points to, and then increment the pointer s.
*s dereferences into a char, which in the loop, a zero (0, or '\0') will act as false, terminating the loop, all other non-zero characters keep it as true.
The char (*s) gets cast to int, for conditions it holds that any integer != 0 is interpreted as true, so the loop ands when a '\0' char is encountered.
Because the loop itself modifies s (with *s++), the while condition can examine it each time around the loop, and it will eventually terminate, when the pointer points to a nul character.
while (*s)
while the character pointed by s is not zero (that is, if we did't reach the end of the string)
putchar(*s++);
it can be thought as
putchar(*s); // write the character pointed by s
s += 1; // go to next one
s is a pointer on a string.
The end of a string is detected by a 0 value

How does this C code work?

I was looking at the following code I came across for printing a string in reverse order in C using recursion:
void ReversePrint(char *str) { //line 1
if(*str) { //line 2
ReversePrint(str+1); //line 3
putchar(*str); //line 4
}
}
I am relatively new to C and am confused by line 2. *str from my understanding is dereferencing the pointer and should return the value of the string in the current position. But how is this being used as an argument to a conditional statement (which should except a boolean right?)? In line 3, the pointer will always be incremented to the next block (4 bytes since its an int)...so couldn't this code fail if there happens to be data in the next memory block after the end of the string?
Update: so there are no boolean types in c correct? A conditional statement evaluates to 'false' if the value is 0, and 'true' otherwise?
Line 2 is checking to see if the current character is the null terminator of the string - since C strings are null-terminated, and the null character is considered a false value, it will begin unrolling the recursion when it hits the end of the string (instead of trying to call StrReverse4 on the character after the null terminator, which would be beyond the bounds of the valid data).
Also note that the pointer is to a char, thus incrementing the pointer only increments by 1 byte (since char is a single-byte type).
Example:
0 1 2 3
+--+--+--+--+
|f |o |o |\0|
+--+--+--+--+
When str = 0, then *str is 'f' so the recursive call is made for str+1 = 1.
When str = 1, then *str is 'o' so the recursive call is made for str+1 = 2.
When str = 2, then *str is 'o' so the recursive call is made for str+1 = 3.
When str = 3, then *str is '\0' and \0 is a false value thus if(*str) evaluates to false, so no recursive call is made, thus going back up the recursion we get...
Most recent recursion was followed by `putchar('o'), then after that,
Next most recent recursion was followed by `putchar('o'), then after that,
Least recent recursion was followed by `putchar('f'), and we're done.
The type of a C string is nothing but a pointer to char. The convention is that what the pointer points to is an array of characters, terminated by a zero byte.
*str, thus, is the first character of the string pointed to by str.
Using *str in a conditional evaluates to false if str points to the terminating null byte in the (empty) string.
At the end of a string is typically a 0 byte - the line if (*str) is checking for the existence of that byte and stopping when it gets to it.
In line 3, the pointer will always be incremented to the next block (4 bytes since its an int)...
Thats wrong, this is char *, it will only be incremented by 1. Because char is 1 byte long only.
But how is this being used as an argument to a conditional statement (which should except a boolean right?)?
You can use any value in if( $$ ) at $$, and it will only check if its non zero or not, basically bool is also implemented as simple 1=true and 0=false only.
In other higher level strongly typed language you cant use such things in if, but in C everything boils down to numbers. And you can use anything.
if(1) // evaluates to true
if("string") // evaluates to true
if(0) // evaulates to false
You can give any thing in if,while conditions in C.
At the end of the string there is a 0 - so you have "test" => [0]'t' [1]'e' [2]'s' [3]'t' [4]0
and if(0) -> false
this way this will work.
C has no concept of boolean values: in C, every scalar type (ie arithmetic and pointer types) can be used in boolean contexts where 0 means false and non-zero true.
As strings are null-terminated, the terminator will be interpreted as false, whereas every other character (with non-zero value!) will be true. This means means there's an easy way to iterate over the characters of a string:
for(;*str; ++str) { /* so something with *str */ }
StrReverse4() does the same thing, but by recursion instead of iteration.
conditional statements (if, for, while, etc) expect a boolean expression. If you provide an integer value the evaluation boils down to 0 == false or non-0 == true. As mentioned already, the terminating character of a c-string is a null byte (integer value 0). So the if will fail at the end of the string (or first null byte within the string).
As an aside, if you do *str on a NULL pointer you are invoking undefined behavior; you should always verify that a pointer is valid before dereferencing.
This is kind of off topic, but when I saw the question I immediately wondered if that was actually faster than just doing an strlen and iterate from the back.
So, I made a little test.
#include <string.h>
void reverse1(const char* str)
{
int total = 0;
if (*str) {
reverse1(str+1);
total += *str;
}
}
void reverse2(const char* str)
{
int total = 0;
size_t t = strlen(str);
while (t > 0) {
total += str[--t];
}
}
int main()
{
const char* str = "here I put a very long string ...";
int i=99999;
while (--i > 0) reverseX(str);
}
I first compiled it with X=1 (using the function reverse1) and then with X=2. Both times with -O0.
It consistently returned approximately 6 seconds for the recursive version and 1.8 seconds for the strlen version.
I think it's because strlen is implemented in assembler and the recursion adds quite an overhead.
I'm quite sure that the benchmark is representative, if I'm mistaken please correct me.
Anyway, I thought I should share this with you.
1.
str is a pointer to a char. Incrementing str will make the pointer point to the second character of the string (as it's a char array).
NOTE: Incrementing pointers will increment by the data type the pointer points to.
For ex:
int *p_int;
p_int++; /* Increments by 4 */
double *p_dbl;
p_dbl++; /* Increments by 8 */
2.
if(expression)
{
statements;
}
The expression is evaluated and if the resulting value is zero (NULL, \0, 0), the statements are not executed. As every string ends with \0 the recursion will have to end some time.
Try this code, which is as simple as the one which you are using:
int rev(int lower,int upper,char*string)
{
if(lower>upper)
return 0;
else
return rev(lower-1,upper-1,string);
}

Resources