C pointers: difference between while(*s++) { ;} and while(*s) { s++;} - c

I'm going through K & R, and am having difficulty with incrementing pointers. Exercise 5.3 (p. 107) asks you to write a strcat function using pointers.
In pseudocode, the function does the following:
Takes 2 strings as inputs.
Finds the end of string one.
Copies string two onto the end of string one.
I got a working answer:
void strcats(char *s, char *t)
{
while (*s) /* finds end of s*/
s++;
while ((*s++ = *t++)) /* copies t to end of s*/
;
}
But I don't understand why this code doesn't also work:
void strcats(char *s, char *t)
{
while (*s++)
;
while ((*s++ = *t++))
;
}
Clearly, I'm missing something about how pointer incrementation works. I thought the two forms of incrementing s were equivalent. But the second code only prints out string s.
I tried a dummy variable, i, to check whether the function went through both loops. It did. I read over the sections 5.4 and 5.5 of K & R, but I couldn't find anything that sheds light on this.
Can anyone help me figure out why the second version of my function isn't doing what I would like it to? Thanks!
edit: Thanks everyone. It's incredible how long you can stare at a relatively simple error without noticing it. Sometimes there's no better remedy than having someone else glance at it.

This:
while(*s++)
;
due to post-increment, locates the nul byte at the end of the string, then increments it once more before exiting the loop. t is copied after then nul:
scontents␀tcontents␀
Printing s will stop at the first nul.
This:
while(*s)
s++;
breaks from the loop when the 0 is found, so you are left pointing at the nul byte. t is copied over the nul:
scontentstcontents␀

It's an off-by-one issue. Your second version increments the pointer every time the test is evaluated. The original increments one fewer time -- the last time when the test evaluates to 0, the increment isn't done. Therefore in the second version, the new string is appended after the original terminating \0, while in the first version, the first character of the new string overwrites that \0.

This:
while (*s)
s++;
stops as soon as *s is '\0', at which point it leaves s there (because it doesn't execute the body of the loop).
This:
while (*s++)
;
stops as soon as *s is '\0', but still executes the postincrement ++, so s ends up pointing right after the '\0'. So the string-terminating '\0' never gets overwritten, and it still terminates the string.

There's one less operation in while (*s) ++s; When *s is zero, then the loop breaks, while the form while (*s++) breaks but still increments s one last time.
Strictly speaking, the latter form may be incorrect (i.e. UB) if you attempt to form an invalid pointer. This is contrived, of course, but here's an example: char x = 0, * p = &x; while (*x++) { }.
Independent of that, it's best to write clean, readable and deliberate code rather than trying to outsmart yourself. Sometimes you can write nifty code in C that is actually elegant, and other times it's better to spell something out properly. Use your judgement, and ask someone else for feedback (or watch their faces as they look at your code).

let's assume the following characters in memory:
Address 0x00 0x01 0x02 0x03
------- ---- ---- ---- ----
0x8000 'a' 'b' 'c' 0
0x8004 ...
While executing loop, it happens in memory.
1. *s = 'a'
2. s = 0x8001
3. *s = 'b'
4. s = 0x8002
5. *s = 'c'
6. s = 0x8003
7. *s = 0;
8. s = 0x8004
9. end loop
While evaluating, *s++ advances the pointer even if the value of *s is 0.
// move s forward until it points one past a 0 character
while (*s++);
It doesn't work at all because s ends up pointing to a different place.
As it summarizes, we get a garbage value as last character in our target string. That garbage string is because of while loop exceed the limit of '\0' by one step forward.
You can eliminate it by using the below code, I think it is efficient
while (*s)
s++;
It execute as below in memory perspective.
1. *s = 'a'
2. s = 0x8001
3. *s = 'b'
4. s = 0x8002
5. *s = 'c'
6. s = 0x8003
7. *s = 0
8. end loop

Related

Can someone explain me the use of --destination here?

I am completing my CISCO course on C and I got a doubt in the following function.
Can someone please explain me the logic of the function, especially the use of --destination here?
char *mystrcat(char *destination, char *source)
{
char *res;
for(res = destination; *destination++; ) ;
for(--destination; (*destination++ = *source++); ) ;
return res;
}
The first loop is looking for the string teminator. When it finds it, with *destination being false, the pointer is still post-incremented with *destination++.
So the next loop starts by decrementing the pointer back to pointing to the '\0' terminator, to start the concatentation.
In the second loop, each character is copied until the string terminator is found with (*destination++ = *source++); which is evaluated as the loop control. Again, this will include the required string terminator being copied.
This is a very complicated function for something that shouldn't be written so difficult.
--destination is a weird feature of C. I'm assuming you already know that variable++ increments the variable by one. Similarly variable-- decrements the variable by one. The thing is, when the ++ or -- comes after the variable name, that operation is done after the line is executed as a whole, when it is before the variable, C does the arithmetic first, then evaluates the full line.
For an example:
int c = 5
print(c++) -> outputs '5'
print(c) -> outputs '6'
but
int d = 5
print(++d) -> outputs '6'
print(d) -> outputs '6'
This is because in the second example, the increment is evaluated before the entire line is evaluate.
Hope that helps.

What does while(*pointer) means in C?

When I recently look at some passage about C pointers, I found something interesting. What it said is, a code like this:
char var[10];
char *pointer = &var;
while(*pointer!='\0'){
//Something To loop
}
Can be turned into this:
//While Loop Part:
while(*pointer){
//Something to Loop
}
So, my problem is, what does *pointer means?
while(x) {
do_something();
}
will run do_something() repeatedly as long as x is true. In C, "true" means "not zero".
'\0' is a null character. Numerically, it's zero (the bits that represents '\0' is the same as the number zero; just like a space is the number 0x20 = 32).
So you have while(*pointer != '\0'). While the pointed-to -memory is not a zero byte. Earlier, I said "true" means "non-zero", so the comparison x != 0 (if x is int, short, etc.) or x != '\0' (if x is char) the same as just x inside an if, while, etc.
Should you use this shorter form? In my opinion, no. It makes it less clear to someone reading the code what the intention is. If you write the comparison explicitly, it makes it a lot more obvious what the intention of the loop is, even if they technically mean the same thing to the compiler.
So if you write while(x), x should be a boolean or a C int that represents a boolean (a true-or-false concept) already. If you write while(x != 0), then you care about x being a nonzero integer and are doing something numerical with x. If you write while(x != '\0'), then x is a char and you want to keep going until you find a null character (you're probably processing a C string).
*pointer means dereference the value stored at the location pointed by pointer. When pointer points to a string and used in while loop like while(*pointer), it is equivalent to while(*pointer != '\0'): loop util null terminator if found.
Let's start with a simple example::
int a = 2 ;
int *b = &a ;
/* Run the loop till *b i.e., 2 != 0
Now, you know that, the loop will run twice
and then the condition will become false
*/
while( *b != 0 )
{
*b-- ;
}
Similarly, your code is working with char*, a string.
char var[10] ;
/* copy some string of max char count = 9,
and append the end of string with a '\0' char.*/
char *pointer = &var ;
while( *pointer != '\0' )
{
// do something
// Increment the pointer 1 or some other valid value
}
So, the while loop will run till *pointer don't hit '\0'.
while( *pointer )
/* The above statement means the same as while( *pointer != '\0' ),
because, null char ('\0') = decimal value, numeric zero, 0*/
But the usage can change when you do, while(*pointer != 'x'), where x can be any char. In this case, your first code will exit after *pointer hits the 'x' char but your second snippet will run till *pointer hits '\0' char.
Yes, you can go for it.
Please note that *pointer is the value at the memory location the pointer point to(or hold the address of).
Your *pointer is now pointing to the individual characters of the character array var.
So, while(*pointer) is shorthand usage of the equivalent
while(*pointer!='\0').
Suppose, your string is initialized to 9 characters say "123456789" and situated at an address say addr(memory location).
Now because of the statement:
char *pointer=&var;
pointer will point to first element of string "1234567890".
When you write the *pointer it will retrieve the value stored at the memory location addr which is 1.
Now, the statement:
while(*pointer)
will be equivalent to
while(49)
because ASCII Value of 1 is 49, and condition is evaluated to true.
This will continue till \0 character is reached after incrementing pointer for nine times.
Now, the statement:
while(*pointer)
will be equivalent to
while(0)
because ASCII value of \0 is 0. Thus, condition is evaluated to false and loop stops.
Summary:
In while(condition), condition must be non-zero to continue loop execution. If condition evaluates to zero then loop stops executing.
while(*pointer) will work till the value at memory location being pointed to is a non-zero ASCII value.
Also you can use:
if(*ptr){ //instead of if(*ptr!='\0')
//do somthing
}
if(!*ptr){ //instead of if(*ptr=='\0')
//do somthing
}
*pointer means exactly what it says: "Give me the value that's stored at the place that the pointer points to". Or "dereference pointer" for short. In your concrete example, dereferencing the pointer produces the one of the characters in a string.
while(*pointer) also means exactly what is says: "While the expression *pointer yields a true value, execute the body of the loop".
Since C considers all non-zero values as true, using *pointer in a condition is always equivalent to using the expression *pointer != 0. Consequently, many C programmers omit the != 0 part in order to practice boolean zen.

Why isn't this pointing to the null character in array? ('\0')

Sorry about the poorly worded question, I couldn't think of a better name.
I am learning C, have just moved onto pointers and have written a function, strcat(char *s, char *t), which adds t to the end of s:
void strcat(char *s, char *t) //add t to the end of s
{
while(*s++) //get to the end of s
;
*s--; //unsure why I need this
while(*s++ = *t++) //copy t to the end of s
;
return;
}
Now the question I have is why do I need the line:
*s--;
When I originally added it I thought it made sense until I went through the code.
I would have thought the following was true though:
1) The first loop increments continually and when *s is 0 (or the null character) it moves on so now *s points to the null character of the array.
2) So all I should have to do is implement the second loop. The original null character of s will be replaced by the first character of t until we get to t's null character at which point we exit the second loop and returns.
Clearly I am missing something as the code doesn't work without it!!
After the first loop *s points to one position beyond '\0' but my question is why?
Thanks in advance :)
First *s is evaluated then s is incremented.
So when reaching s's 0-terminator the loop ends, but s still is incremented one more time.
Also there is no need to do:
*s--;
Doing
--s;
or
s--;
would be enough. There is no need to de-reference s here.
Or simply do
while (*s)
++s;
to get rid of --s;'s need at all.
You incremented the pointer after checking the value of the location it was pointing at. Functionally this is happening in while( *s++ ):
while( *s )
++s;
Change your first while to:
if (*s) {
while(*(++s)) //get to the end of s
;
}
In your code, you would always be checking if it was pointing to '\0' and then incrementing, so when you reach the '\0' you would check it only on the next iteration, and then you would increment it. Note that changing to pre-increment will not check if the pointer currently points to '\0', so you need to check it before the while.
Note that your code (post-increment and a decrement after the while) might be faster on most platforms (usually a branch is slower than a decrement), my code in this answer is just for you understand the problem.
The ++ operator after the variable name does postincrement, which means it increments by one, but the result of the operator is the value before the increment. If you used ++s, it would be different.
If s is 4 , then s will be 5 after x=++s as well as after x=s++. But the result (value of x) in the first case is 5, while it's 4 in the second case.
So in your while *s++, when s points to the '\0', you increment it, then take the old, un-incremented pointer, dereference it, see the \0, and stop the loop.
Btw, your '*s--' should be s-- because you don't need the character 'behind' the pointer there.

Finding end of string: *s++ VS *s then s++

I'm writing a simple string concatenation program.
The program works the way I have posted it. However, I first wrote it using the following code to find the end of the string:
while (*s++)
;
However, that method didn't work. The strings I passed to it weren't copied correctly. Specifically, I tried to copy "abc" to a char[] variable that held "\0".
From reading the C K&R book, it looks like it should work. That compact form should take the following steps.
*s is compared with '\0'
s points to the next address
So why doesn't it work? I am compiling with gcc on Debian.
I found that this version does work:
strncat(char *s, const char *t, int n)
{
char *s_start = s;
while (*s)
s++;
for ( ; n > 0 && *t; n--, s++, t++)
*s = *t;
*(s++) = '\0';
return s_start;
}
Thanks in advance.
After the end of while (*s++);, s points to the character after the null terminator. Take that into account in the code that follows.
The problem is that
while (*s++)
;
Always Increments s, even when s is zero (*s is false)
while (*s)
s++;
only increments s when *s is nonzero
so the first one will leave s pointing to first character after the first \0, while the second one will leave s pointing to the first \0.
There is difference. In the first case, s will point to the position after '\0', while the second stops right at '\0'.
As John Knoeller said, at the end of the run it'll s will point to the location after the NULL. BUT There is no need to sacrifice performance for the correct solution.. Take a look for yourself:
while (*s++); --s;
Should do the trick.
In addition what has been said, note that in C it is technically illegal for a pointer to point to unallocated memory, even if you don't dereference it. So be sure to fix your program, even if it appears to work.

How does this C code work?

I was looking at the following code I came across for printing a string in reverse order in C using recursion:
void ReversePrint(char *str) { //line 1
if(*str) { //line 2
ReversePrint(str+1); //line 3
putchar(*str); //line 4
}
}
I am relatively new to C and am confused by line 2. *str from my understanding is dereferencing the pointer and should return the value of the string in the current position. But how is this being used as an argument to a conditional statement (which should except a boolean right?)? In line 3, the pointer will always be incremented to the next block (4 bytes since its an int)...so couldn't this code fail if there happens to be data in the next memory block after the end of the string?
Update: so there are no boolean types in c correct? A conditional statement evaluates to 'false' if the value is 0, and 'true' otherwise?
Line 2 is checking to see if the current character is the null terminator of the string - since C strings are null-terminated, and the null character is considered a false value, it will begin unrolling the recursion when it hits the end of the string (instead of trying to call StrReverse4 on the character after the null terminator, which would be beyond the bounds of the valid data).
Also note that the pointer is to a char, thus incrementing the pointer only increments by 1 byte (since char is a single-byte type).
Example:
0 1 2 3
+--+--+--+--+
|f |o |o |\0|
+--+--+--+--+
When str = 0, then *str is 'f' so the recursive call is made for str+1 = 1.
When str = 1, then *str is 'o' so the recursive call is made for str+1 = 2.
When str = 2, then *str is 'o' so the recursive call is made for str+1 = 3.
When str = 3, then *str is '\0' and \0 is a false value thus if(*str) evaluates to false, so no recursive call is made, thus going back up the recursion we get...
Most recent recursion was followed by `putchar('o'), then after that,
Next most recent recursion was followed by `putchar('o'), then after that,
Least recent recursion was followed by `putchar('f'), and we're done.
The type of a C string is nothing but a pointer to char. The convention is that what the pointer points to is an array of characters, terminated by a zero byte.
*str, thus, is the first character of the string pointed to by str.
Using *str in a conditional evaluates to false if str points to the terminating null byte in the (empty) string.
At the end of a string is typically a 0 byte - the line if (*str) is checking for the existence of that byte and stopping when it gets to it.
In line 3, the pointer will always be incremented to the next block (4 bytes since its an int)...
Thats wrong, this is char *, it will only be incremented by 1. Because char is 1 byte long only.
But how is this being used as an argument to a conditional statement (which should except a boolean right?)?
You can use any value in if( $$ ) at $$, and it will only check if its non zero or not, basically bool is also implemented as simple 1=true and 0=false only.
In other higher level strongly typed language you cant use such things in if, but in C everything boils down to numbers. And you can use anything.
if(1) // evaluates to true
if("string") // evaluates to true
if(0) // evaulates to false
You can give any thing in if,while conditions in C.
At the end of the string there is a 0 - so you have "test" => [0]'t' [1]'e' [2]'s' [3]'t' [4]0
and if(0) -> false
this way this will work.
C has no concept of boolean values: in C, every scalar type (ie arithmetic and pointer types) can be used in boolean contexts where 0 means false and non-zero true.
As strings are null-terminated, the terminator will be interpreted as false, whereas every other character (with non-zero value!) will be true. This means means there's an easy way to iterate over the characters of a string:
for(;*str; ++str) { /* so something with *str */ }
StrReverse4() does the same thing, but by recursion instead of iteration.
conditional statements (if, for, while, etc) expect a boolean expression. If you provide an integer value the evaluation boils down to 0 == false or non-0 == true. As mentioned already, the terminating character of a c-string is a null byte (integer value 0). So the if will fail at the end of the string (or first null byte within the string).
As an aside, if you do *str on a NULL pointer you are invoking undefined behavior; you should always verify that a pointer is valid before dereferencing.
This is kind of off topic, but when I saw the question I immediately wondered if that was actually faster than just doing an strlen and iterate from the back.
So, I made a little test.
#include <string.h>
void reverse1(const char* str)
{
int total = 0;
if (*str) {
reverse1(str+1);
total += *str;
}
}
void reverse2(const char* str)
{
int total = 0;
size_t t = strlen(str);
while (t > 0) {
total += str[--t];
}
}
int main()
{
const char* str = "here I put a very long string ...";
int i=99999;
while (--i > 0) reverseX(str);
}
I first compiled it with X=1 (using the function reverse1) and then with X=2. Both times with -O0.
It consistently returned approximately 6 seconds for the recursive version and 1.8 seconds for the strlen version.
I think it's because strlen is implemented in assembler and the recursion adds quite an overhead.
I'm quite sure that the benchmark is representative, if I'm mistaken please correct me.
Anyway, I thought I should share this with you.
1.
str is a pointer to a char. Incrementing str will make the pointer point to the second character of the string (as it's a char array).
NOTE: Incrementing pointers will increment by the data type the pointer points to.
For ex:
int *p_int;
p_int++; /* Increments by 4 */
double *p_dbl;
p_dbl++; /* Increments by 8 */
2.
if(expression)
{
statements;
}
The expression is evaluated and if the resulting value is zero (NULL, \0, 0), the statements are not executed. As every string ends with \0 the recursion will have to end some time.
Try this code, which is as simple as the one which you are using:
int rev(int lower,int upper,char*string)
{
if(lower>upper)
return 0;
else
return rev(lower-1,upper-1,string);
}

Resources